lila b0c0baf9ab updating documentation

2026-04-01 18:02:12 +02:00

11 KiB

Raw Blame History

Decisions Log

A record of non-obvious technical decisions made during development, with reasoning. Intended to preserve context across sessions.

Tooling

Monorepo: pnpm workspaces (not Turborepo)

Turborepo adds parallel task running and build caching on top of pnpm workspaces. For a two-app monorepo of this size, plain pnpm workspace commands are sufficient and there is one less tool to configure and maintain.

TypeScript runner: `tsx` (not `ts-node`)

tsx is faster, requires no configuration, and uses esbuild under the hood. ts-node is older and more complex to configure. tsx does not do type checking — that is handled separately by tsc and the editor. Installed as a dev dependency in apps/api only.

ORM: Drizzle (not Prisma)

Drizzle is lighter — no binary, no engine. Queries map closely to SQL. Migrations are plain SQL files. Works naturally with Zod for type inference. Prisma would add Docker complexity (engine binary in containers) and abstraction that is not needed for this schema.

WebSocket: `ws` library (not Socket.io)

For rooms of 2–4 players, Socket.io's room management, transport fallbacks, and reconnection abstractions are unnecessary overhead. The WS protocol is defined explicitly as a Zod discriminated union in packages/shared, giving the same type safety guarantees. Reconnection logic is deferred to Phase 7.

Auth: OpenAuth (not rolling own JWT)

All auth delegated to OpenAuth service at auth.yourdomain.com. Providers: Google, GitHub. The API validates the JWT on every protected request. User rows are created or updated on first login via the sub claim as the primary key.

Docker

Multi-stage builds for monorepo context

Both apps/web and apps/api use multi-stage Dockerfiles (deps, dev, builder, runner) because:

The monorepo structure requires copying pnpm-workspace.yaml, root package.json, and cross-dependencies (packages/shared, packages/db) before installing
node_modules paths differ between host and container due to workspace hoisting
Stages allow caching pnpm install separately from source code changes

Vite as dev server (not Nginx)

In development, apps/web uses vite dev directly, not Nginx. Reasons:

Hot Module Replacement (HMR) requires Vite's WebSocket dev server
Source maps and error overlay need direct Vite integration
Nginx would add unnecessary proxy complexity for local dev

Production will use Nginx to serve static Vite build output.

Architecture

Express app structure: factory function pattern

app.ts exports a createApp() factory function. server.ts imports it and calls .listen(). This allows tests to import the app directly without starting a server, keeping tests isolated and fast.

Data model: `decks` separate from `terms` (not frequency_rank filtering)

Original approach: Store frequency_rank on terms table and filter by rank range for difficulty.

Problem discovered: WordNet/OMW frequency data is unreliable for language learning. Extraction produced results like:

Rank 1: "In" → "indio" (chemical symbol: Indium)
Rank 2: "Be" → "berillio" (chemical symbol: Beryllium)
Rank 7: "He" → "elio" (chemical symbol: Helium)

These are technically "common" in WordNet (every element is a noun) but useless for vocabulary learning.

Decision:

terms table stores ALL available OMW synsets (raw data, no frequency filtering)
decks table stores curated learning lists (A1, A2, B1, "Most Common 1000", etc.)
deck_terms junction table links terms to decks with position ordering
rooms.deck_id specifies which vocabulary deck a game uses

Benefits:

Curricula can come from external sources (CEFR lists, Oxford 3000, SUBTLEX)
Bad data (chemical symbols, obscure words) excluded at deck level, not schema level
Users can create custom decks later
Multiple difficulty levels without schema changes

Multiplayer mechanic: simultaneous answers (not buzz-first)

All players see the same question at the same time and submit independently. The server waits for all answers or a 15-second timeout, then broadcasts the result. This keeps the experience Duolingo-like and symmetric. A buzz-first mechanic was considered and rejected.

Room model: room codes (not matchmaking queue)

Players create rooms and share a human-readable code (e.g. WOLF-42) to invite friends. Auto-matchmaking via a queue is out of scope for MVP. Valkey is included in the stack and can support a queue in a future phase.

TypeScript Configuration

Base config: no `lib`, `module`, or `moduleResolution`

These are intentionally omitted from tsconfig.base.json because different packages need different values — apps/api uses NodeNext, apps/web uses ESNext/bundler (Vite), and mixing them in the base caused errors. Each package declares its own.

`outDir: "./dist"` per package

The base config originally had outDir: "dist" which resolved relative to the base file location, pointing to the root dist folder. Overridden in each package with "./dist" to ensure compiled output stays inside the package.

`apps/web` tsconfig: deferred to Vite scaffold

The web tsconfig was left as a placeholder and filled in after pnpm create vite generated tsconfig.json, tsconfig.app.json, and tsconfig.node.json. The generated files were then trimmed to remove options already covered by the base.

`rootDir: "."` on `apps/api`

Set explicitly to allow vitest.config.ts (which lives outside src/) to be included in the TypeScript program. Without it, TypeScript infers rootDir as src/ and rejects any file outside that directory.

ESLint

Two-config approach for `apps/web`

The root eslint.config.mjs handles TypeScript linting across all packages. apps/web/eslint.config.js is kept as a local addition for React-specific plugins only: eslint-plugin-react-hooks and eslint-plugin-react-refresh. ESLint flat config merges them automatically by directory proximity — no explicit import between them needed.

Coverage config at root only

Vitest coverage configuration lives in the root vitest.config.ts only. Individual package configs omit it to produce a single aggregated report rather than separate per-package reports.

`globals: true` with `"types": ["vitest/globals"]`

Using Vitest globals (describe, it, expect without imports) requires "types": ["vitest/globals"] in each package's tsconfig compilerOptions. Added to apps/api, packages/shared, and packages/db. Added to apps/web/tsconfig.app.json.

Known Issues / Dev Notes

glossa-web has no healthcheck

The web service in docker-compose.yml has no healthcheck defined. Reason: Vite's dev server (vite dev) has no built-in health endpoint. Unlike the API's /api/health, there's no URL to poll.

Workaround: depends_on uses api healthcheck as proxy. For production (Nginx), add a health endpoint or use TCP port check.

Valkey memory overcommit warning

Valkey logs this on start in development:

WARNING Memory overcommit must be enabled for proper functionality

This is harmless in dev but should be fixed before production. The warning appears because Docker containers don't inherit host sysctl settings by default.

Fix: Add to host /etc/sysctl.conf:

vm.overcommit_memory = 1

Then sudo sysctl -p or restart Docker.

Data Model

Users: internal UUID + openauth_sub (not sub as PK)

Original approach: Use OpenAuth sub claim directly as users.id (text primary key).

Problem: Embeds auth provider in the primary key (e.g. "google|12345"). If OpenAuth changes format or a second provider is added, the PK cascades through all FKs (rooms.host_id, room_players.user_id).

Decision:

users.id = internal UUID (stable FK target)
users.openauth_sub = text UNIQUE (auth provider claim)
Allows adding multiple auth providers per user later without FK changes

Rooms: `updated_at` for stale recovery only

Most tables omit updated_at (unnecessary for MVP). rooms.updated_at is kept specifically for stale room recovery—identifying rooms stuck in in_progress status after server crashes.

Translations: UNIQUE (term_id, language_code, text)

Allows multiple synonyms per language per term (e.g. "dog", "hound" for same synset). Prevents exact duplicate rows. Homonyms (e.g. "Lead" metal vs. "Lead" guide) are handled by different term_id values (different synsets), so no constraint conflict.

Decks: `source_language` + `validated_languages` (not `pair_id`)

Original approach: decks.pair_id references language_pairs, tying each deck to a single language pair.

Problem: One deck can serve multiple target languages as long as translations exist for all its terms. A pair_id FK would require duplicating the deck for each target language.

Decision:

decks.source_language — the language the wordlist was curated from (e.g. "en"). A deck sourced from an English frequency list is fundamentally different from one sourced from an Italian list.
decks.validated_languages — array of language codes (excluding source_language) for which full translation coverage exists across all terms in the deck. Recalculated and updated on every run of the generation script.
The language pair used for a quiz session is determined at session start, not at deck creation time.

Benefits:

One deck serves multiple target languages (e.g. en→it and en→fr) without duplication
validated_languages stays accurate as translation data grows
DB enforces via CHECK constraint that source_language is never included in validated_languages

Current State

Phase 0 complete. Phase 1 data pipeline complete.

Completed (Phase 1 — data pipeline)

Run extract-en-it-nouns.py locally → generates datafiles/en-it-nouns.json
Write Drizzle schema: terms, translations, language_pairs, term_glosses, decks, deck_terms
Write and run migration (includes CHECK constraints for pos, gloss_type)
Write packages/db/src/seed.ts (imports ALL terms + translations, NO decks)
Write packages/db/src/generating-decks.ts — idempotent deck generation script
- reads and deduplicates source wordlist
- matches words to DB terms (homonyms included)
- writes unmatched words to -missing file
- determines validated_languages by checking full translation coverage per language
- creates deck if it doesn't exist, adds only missing terms on subsequent runs
- recalculates and persists validated_languages on every run

Known data facts

Wordlist: 999 unique words after deduplication (1000 lines, 1 duplicate)
Term IDs resolved: 3171 (higher than word count due to homonyms)
Words not found in DB: 34
Italian (it) coverage: 3171 / 3171 — full coverage, included in validated_languages

Next (Phase 1 — API layer)

Define Zod response schemas in packages/shared
Implement DeckRepository.getTerms(deckId, limit, offset)
Implement QuizService.attachDistractors(terms)
Implement GET /language-pairs, GET /decks, GET /decks/:id/terms endpoints
Unit tests for QuizService

11 KiB Raw Blame History Unescape Escape

Decisions Log

Tooling

Monorepo: pnpm workspaces (not Turborepo)

TypeScript runner: tsx (not ts-node)

ORM: Drizzle (not Prisma)

WebSocket: ws library (not Socket.io)

Auth: OpenAuth (not rolling own JWT)

Docker

Multi-stage builds for monorepo context

Vite as dev server (not Nginx)

Architecture

Express app structure: factory function pattern

Data model: decks separate from terms (not frequency_rank filtering)

Multiplayer mechanic: simultaneous answers (not buzz-first)

Room model: room codes (not matchmaking queue)

TypeScript Configuration

Base config: no lib, module, or moduleResolution

outDir: "./dist" per package

apps/web tsconfig: deferred to Vite scaffold

rootDir: "." on apps/api

ESLint

Two-config approach for apps/web

Coverage config at root only

globals: true with "types": ["vitest/globals"]

Known Issues / Dev Notes

glossa-web has no healthcheck

Valkey memory overcommit warning

Data Model

Users: internal UUID + openauth_sub (not sub as PK)

Rooms: updated_at for stale recovery only

Translations: UNIQUE (term_id, language_code, text)

Decks: source_language + validated_languages (not pair_id)

Current State

Completed (Phase 1 — data pipeline)

Known data facts

Next (Phase 1 — API layer)

11 KiB

Raw Blame History

TypeScript runner: `tsx` (not `ts-node`)

WebSocket: `ws` library (not Socket.io)

Data model: `decks` separate from `terms` (not frequency_rank filtering)

Base config: no `lib`, `module`, or `moduleResolution`

`outDir: "./dist"` per package

`apps/web` tsconfig: deferred to Vite scaffold

`rootDir: "."` on `apps/api`

Two-config approach for `apps/web`

`globals: true` with `"types": ["vitest/globals"]`

Rooms: `updated_at` for stale recovery only

Decks: `source_language` + `validated_languages` (not `pair_id`)