lila/documentation/decisions.md
2026-04-15 05:16:29 +02:00

23 KiB
Raw Blame History

Decisions Log

A record of non-obvious technical decisions made during development, with reasoning. Intended to preserve context across sessions. Grouped by topic area.


Tooling

Monorepo: pnpm workspaces (not Turborepo)

Turborepo adds parallel task running and build caching on top of pnpm workspaces. For a two-app monorepo of this size, plain pnpm workspace commands are sufficient and there is one less tool to configure and maintain.

TypeScript runner: tsx (not ts-node)

tsx is faster, requires no configuration, and uses esbuild under the hood. ts-node is older and more complex to configure. tsx does not do type checking — that is handled separately by tsc and the editor. Installed as a dev dependency in apps/api only.

ORM: Drizzle (not Prisma)

Drizzle is lighter — no binary, no engine. Queries map closely to SQL. Migrations are plain SQL files. Works naturally with Zod for type inference. Prisma would add Docker complexity (engine binary in containers) and abstraction that is not needed for this schema.

WebSocket: ws library (not Socket.io)

For rooms of 24 players, Socket.io's room management, transport fallbacks, and reconnection abstractions are unnecessary overhead. The WS protocol is defined explicitly as a Zod discriminated union in packages/shared, giving the same type safety guarantees. Reconnection logic is deferred to Phase 7.

Auth: Better Auth (not OpenAuth or Keycloak)

Better Auth embeds as middleware in the Express API — no separate auth service or Docker container. It connects to the existing PostgreSQL via the Drizzle adapter and manages its own tables (user, session, account, verification). Social providers (Google, GitHub) are configured in a single config object. Session validation is a function call within the same process, not a network request. OpenAuth was considered but requires a standalone service and leaves user management to you. Keycloak is too heavy for a single-app project.


Docker

Multi-stage builds for monorepo context

Both apps/web and apps/api use multi-stage Dockerfiles (deps, dev, builder, runner) because the monorepo structure requires copying pnpm-workspace.yaml, root package.json, and cross-dependencies before installing. Stages allow caching pnpm install separately from source code changes.

Vite as dev server (not Nginx)

In development, apps/web uses vite dev directly, not Nginx. HMR requires Vite's WebSocket dev server. Production will use Nginx to serve static Vite build output.


Architecture

Express app structure: factory function pattern

app.ts exports a createApp() factory function. server.ts imports it and calls .listen(). This allows tests to import the app directly without starting a server (used by supertest).

Zod schemas belong in packages/shared

Both the API and frontend import from the same schemas. If the shape changes, TypeScript compilation fails in both places simultaneously — silent drift is impossible.

Server-side answer evaluation

The correct answer is never sent to the frontend in GameQuestion. It is only revealed in AnswerResult after the client submits. Prevents cheating and keeps game logic authoritative on the server.

safeParse over parse in controllers

parse throws a raw Zod error → ugly 500 response. safeParse returns a result object → clean 400 with early return via the error handler.

POST not GET for game start

GET requests have no body. Game configuration is submitted as a JSON body → POST is semantically correct.

Model parameters use shared types, not GameRequestType

The model layer should not know about GameRequestType — that's an HTTP boundary concern. Parameters are typed using the derived constant types (SupportedLanguageCode, SupportedPos, DifficultyLevel) exported from packages/shared.

Model returns neutral field names, not quiz semantics

getGameTerms returns sourceText / targetText / sourceGloss rather than prompt / answer / gloss. Quiz semantics are applied in the service layer. Keeps the model reusable for non-quiz features.

Asymmetric difficulty filter

Difficulty is filtered on the target (answer) side only. A word can be A2 in Italian but B1 in English, and what matters is the difficulty of the word being learned.

optionId as integer 0-3, not UUID

Options only need uniqueness within a single question; cheating prevented by shuffling, not opaque IDs.

questionId and sessionId as UUIDs

Globally unique, opaque, natural Valkey keys when storage moves later.

gloss is string | null rather than optional

Predictable shape on the frontend — always present, sometimes null.

GameSessionStore stores only the answer key

Minimal payload (questionId → correctOptionId) for easy Valkey migration. All methods are async even for the in-memory implementation, so the service layer is already written for Valkey.

Distractors fetched per-question (N+1 queries)

Correct shape for the problem; 10 queries on local Postgres is negligible latency.

No fallback logic for insufficient distractors

Data volumes are sufficient; strict query throws if something is genuinely broken.

Distractor query excludes both term ID and answer text

Prevents duplicate options from different terms with the same translation.

Submit-before-send flow on frontend

User selects, then confirms. Prevents misclicks.

Multiplayer mechanic: simultaneous answers (not buzz-first)

All players see the same question at the same time and submit independently. The server waits for all answers or a 15-second timeout, then broadcasts the result. Keeps the experience symmetric.

Room model: room codes (not matchmaking queue)

Players create rooms and share a human-readable code (e.g. WOLF-42). Auto-matchmaking deferred.


Error Handling

AppError base class over error code maps

A statusCode on the error itself means the middleware doesn't need a lookup table. New error types are self-contained — one class, one status code. ValidationError (400) and NotFoundError (404) extend AppError.

next(error) over res.status().json() in controllers

Express requires explicit next(error) for async handlers — it does not catch async errors automatically. Centralises all error formatting in one middleware. Controllers stay clean: validate, call service, send response.

Zod .message over .issues[0]?.message

Returns all validation failures at once, not just the first. Output is verbose (raw JSON string) — revisit formatting post-MVP if the frontend needs structured { field, message }[] error objects.

Where errors are thrown

ValidationError is thrown in the controller (the layer that runs safeParse). NotFoundError is thrown in the service (the layer that knows whether a session or question exists). The service doesn't know about HTTP — it throws a typed error, and the middleware maps it to a status code.


Testing

Mocked DB for unit tests (not test database)

Unit tests mock @lila/db via vi.mock — the real database is never touched. Tests run in milliseconds with no infrastructure dependency. Integration tests with a real test DB are deferred post-MVP.

Co-located test files

gameService.test.ts lives next to gameService.ts, not in a separate __tests__/ directory. Convention matches the vitest default and keeps related files together.

supertest for endpoint tests

Uses createApp() factory directly — no server started. Tests the full HTTP layer (routing, middleware, error handler) with real request/response assertions.


TypeScript Configuration

Base config: no lib, module, or moduleResolution

Intentionally omitted from tsconfig.base.json because different packages need different values — apps/api uses NodeNext, apps/web uses ESNext/bundler (Vite). Each package declares its own.

outDir: "./dist" per package

The base config originally had outDir: "dist" which resolved relative to the base file location, pointing to the root dist folder. Overridden in each package with "./dist".

apps/web tsconfig: deferred to Vite scaffold

Filled in after pnpm create vite generated tsconfig files. The generated files were trimmed to remove options already covered by the base.

rootDir: "." on apps/api

Set explicitly to allow vitest.config.ts (outside src/) to be included in the TypeScript program.

Type naming: PascalCase

supportedLanguageCodeSupportedLanguageCode. TypeScript convention.

Primitive types: always lowercase

number not Number, string not String. The uppercase versions are object wrappers and not assignable to Drizzle's expected primitive types.

globals: true with "types": ["vitest/globals"]

Using Vitest globals requires "types": ["vitest/globals"] in each package's tsconfig. Added to apps/api, packages/shared, packages/db, and apps/web/tsconfig.app.json.


ESLint

Two-config approach for apps/web

Root eslint.config.mjs handles TypeScript linting across all packages. apps/web/eslint.config.js adds React-specific plugins only. ESLint flat config merges them by directory proximity.

Coverage config at root only

Vitest coverage configuration lives in the root vitest.config.ts only. Produces a single aggregated report.


Data Model

Users: Better Auth manages the user table

Better Auth creates and owns the user table (plus session, account, verification). The account table links social provider identities to users — one user can have both Google and GitHub linked. Other tables (rooms, stats) reference user.id via FK. No need to design a custom user schema or handle provider-specific claims manually.

Rooms: updated_at for stale recovery only

Most tables omit updated_at. rooms.updated_at is kept specifically for identifying rooms stuck in in_progress status after server crashes.

Translations: UNIQUE (term_id, language_code, text)

Allows multiple synonyms per language per term (e.g. "dog", "hound" for same synset). Prevents exact duplicate rows.

One gloss per term per language

The unique constraint on term_glosses was tightened from (term_id, language_code, text) to (term_id, language_code) to prevent left joins from multiplying question rows. Revisit if multiple glosses per language are ever needed.

Decks: source_language + validated_languages (not pair_id)

One deck can serve multiple target languages as long as translations exist for all its terms. source_language is the language the wordlist was curated from. validated_languages is recalculated on every generation script run. Enforced via CHECK: source_language is never in validated_languages.

Decks: wordlist tiers as scope (not POS-split decks)

One deck per frequency tier per source language (e.g. en-core-1000). POS, difficulty, and category are query filters applied inside that boundary. Decks must not overlap — each term appears in exactly one tier.

Decks: SUBTLEX as wordlist source (not manual curation)

The most common 1000 nouns in English are not the same 1000 nouns that are most common in Italian. SUBTLEX exists in per-language editions derived from subtitle corpora using the same methodology — making them comparable. en-core-1000 built from SUBTLEX-EN, it-core-1000 from SUBTLEX-IT.

language_pairs table: dropped

Valid pairs are implicitly defined by decks.source_language + decks.validated_languages. The table was redundant.

Terms: synset_id nullable (not NOT NULL)

Non-WordNet terms won't have a synset ID. Postgres UNIQUE on a nullable column allows multiple NULL values.

Terms: source + source_id columns

Once multiple import pipelines exist (OMW, Wiktionary), synset_id alone is insufficient as an idempotency key. Unique constraint on the pair. Postgres allows multiple NULL pairs. synset_id remains for now — deprecate during a future pipeline refactor.

cefr_level on translations (not terms)

CEFR difficulty is language-relative, not concept-relative. "House" in English is A1, "domicile" is also English but B2 — same concept, different words, different difficulty. Added as nullable varchar(2) with CHECK.

Categories + term_categories: empty for MVP

Schema exists. Grammar maps to POS (already on terms), Media maps to deck membership. Thematic categories require a metadata source still under research.

CHECK over pgEnum for extensible value sets

ALTER TYPE enum_name ADD VALUE in Postgres is non-transactional — cannot be rolled back if a migration fails. CHECK constraints are fully transactional. Rule: pgEnum for truly static sets, CHECK for any set tied to a growing constant.

language_code always CHECK-constrained

Unlike source (only written by import scripts), language_code is a query-critical filter column. A typo would silently produce missing data. Rule: any column game queries filter on should be CHECK-constrained.

Unique constraints make explicit FK indexes redundant

Postgres automatically creates an index to enforce a unique constraint. A separate index on the leading column of an existing unique constraint adds no value.


Data Pipeline

Seeding v1: batch, truncate-based

For dev/first-time setup. Read JSON, batch inserts in groups of 500, truncate tables before each run. Simple and fast.

Key pitfalls encountered:

  • Duplicate key on re-run: truncate before seeding
  • onConflictDoNothing breaks FK references: when it skips a terms insert, the in-memory UUID is never written, causing FK violations on translations
  • forEach doesn't await: use for...of
  • Final batch not flushed: guard with if (termsArray.length > 0) after loop

Seeding v2: incremental upsert, multi-file

For production / adding languages. Extends the database without truncating. Each synset processed individually (no batching — need real term.id from DB before inserting translations). Filename convention: sourcelang-targetlang-pos.json.

CEFR enrichment pipeline

Staged ETL: extract-*.pycompare-*.py (quality gate) → merge-*.py (resolve conflicts) → enrich.ts (write to DB). Source priority: English en_m3 > cefrj > octanove > random, Italian it_m3 > italian.

Enrichment results: English 42,527/171,394 (~25%), Italian 23,061/54,603 (~42%). Both sufficient for MVP. Italian C2 has only 242 terms — noted as constraint for distractor algorithm.

Term glosses: Italian coverage is sparse

OMW gloss data is primarily English. English glosses: 95,882 (~100%), Italian: 1,964 (~2%). UI falls back to English gloss when no gloss exists for the user's language.

Glosses can leak answers

Some WordNet glosses contain the target-language word in the definition text (e.g. "Padre" in the English gloss for "father"). Address during post-MVP data enrichment — clean glosses, replace with custom definitions, or filter at service layer.

packages/db exports fix

The exports field must be an object, not an array:

"exports": {
  ".": "./src/index.ts",
  "./schema": "./src/db/schema.ts"
}

API Development: Problems & Solutions

  1. Messy API structure. Responsibilities bleeding across layers. Fixed with strict layered architecture.
  2. No shared contract. API could return different shapes silently. Fixed with Zod schemas in packages/shared.
  3. Type safety gaps. any types, Number vs number. Fixed with derived types from constants.
  4. getGameTerms in wrong package. Model queries in apps/api meant direct drizzle-orm dependency. Moved to packages/db/src/models/.
  5. Deck generation complexity. 12 decks assumed, only 2 needed. Then skipped entirely for MVP — query terms table directly.
  6. GAME_ROUNDS type conflict. z.enum() only accepts strings. Keep as strings, convert to number in service.
  7. Gloss join multiplied rows. Multiple glosses per term per language. Fixed by tightening unique constraint.
  8. Model leaked quiz semantics. Return fields named prompt/answer. Renamed to neutral sourceText/targetText.
  9. AnswerResult wasn't self-contained. Frontend needed selectedOptionId but schema didn't include it. Added.
  10. Distractor could duplicate correct answer. Different terms with same translation. Fixed with ne(translations.text, excludeText).
  11. TypeScript strict mode flagged Fisher-Yates shuffle. noUncheckedIndexedAccess treats result[i] as T | undefined. Fixed with non-null assertion + temp variable.

Known Issues / Dev Notes

lila-web has no healthcheck

Vite's dev server has no built-in health endpoint. depends_on uses API healthcheck as proxy. For production (Nginx), add a health endpoint or TCP port check.

Valkey memory overcommit warning

Harmless in dev. Fix before production: add vm.overcommit_memory = 1 to host /etc/sysctl.conf.


Open Research

Semantic category metadata source

Categories (animals, kitchen, etc.) are in the schema but empty. Options researched:

  1. WordNet domain labels — already in OMW, coarse and patchy
  2. Princeton WordNet Domains — ~200 hierarchical domains, freely available, meaningfully better
  3. Kelly Project — CEFR levels AND semantic fields, designed for language learning. Could solve frequency tiers and categories in one shot
  4. BabelNet / WikiData — rich but complex integration, licensing issues
  5. LLM-assisted categorization — fast and cheap at current term counts, not reproducible without saving output
  6. Hybrid (WordNet Domains + LLM gap-fill) — likely most practical
  7. Manual curation — full control, too expensive at scale

Current recommendation: research Kelly Project first. If coverage is insufficient, go with Option 6.

SUBTLEX → cefr_level mapping strategy

Raw frequency ranks need mapping to A1C2 bands before tiered decks are meaningful. Decision pending.

Future extensions: morphology and pronunciation

All deferred post-MVP, purely additive (new tables referencing existing terms):

  • noun_forms — gender, singular, plural, articles per language (source: Wiktionary)
  • verb_forms — conjugation tables per language (source: Wiktionary)
  • term_pronunciations — IPA and audio URLs per language (source: Wiktionary / Forvo)

Deployment

Reverse proxy: Caddy (not Nginx, not Traefik)

Caddy provides automatic HTTPS via Let's Encrypt with zero configuration beyond specifying domain names. The entire Caddyfile is ~10 lines. Nginx would require manual certbot setup and more verbose config. Traefik's auto-discovery of Docker containers (via labels) is powerful but overkill for a stable three-service stack where routing rules never change. Caddy runs as a Docker container alongside the app — no native install.

Subdomain routing (not path-based)

lilastudy.com serves the frontend, api.lilastudy.com serves the API, git.lilastudy.com serves Forgejo. Cleaner separation than path-based routing — any service can be moved to a different server just by changing DNS. Requires CORS configuration since the browser sees different origins, and cross-subdomain cookies via COOKIE_DOMAIN=.lilastudy.com. Wildcard DNS (*.lilastudy.com) means new subdomains require no DNS changes.

Frontend served by nginx:alpine (not Node, not Caddy)

Vite builds to static files. Serving them with nginx inside the container is lighter than running a Node process and keeps the container at ~7MB. Caddy could serve them directly, but using a separate container maintains the one-service-per-container principle and keeps Caddy's config purely about routing.

SPA fallback via nginx try_files

Without try_files $uri $uri/ /index.html, refreshing on /play returns 404 because there's no actual play file. Nginx serves index.html for all routes and lets TanStack Router handle client-side routing.

Forgejo as git server + container registry (not GitHub, not Docker Hub)

Keeps everything self-hosted on one VPS. Forgejo's built-in package registry doubles as a container registry, eliminating a separate service. Git push and image push go to the same server.

Forgejo SSH on port 2222 (not 22)

Port 22 is the VPS's own SSH. Mapping Forgejo's SSH to 2222 avoids conflicts. Dev laptop ~/.ssh/config maps git.lilastudy.com to port 2222 so git commands work without specifying the port every time.

packages/db and packages/shared exports: compiled JS paths

Exports in both package.json files point to ./dist/src/index.js, not TypeScript source. In dev, tsx can run TypeScript, but in production Node cannot. This means packages must be built before the API starts in dev — acceptable since these packages change infrequently. Alternative approaches (conditional exports, tsconfig paths) were considered but added complexity for no practical benefit.

Environment-driven config for production vs dev

CORS origin, Better Auth base URL, cookie domain, API URL, and OAuth credentials are all read from environment variables with localhost fallbacks. The same code runs in both environments without changes. VITE_API_URL is the exception — it's baked in at build time via Docker build arg because Vite replaces import.meta.env at compile time, not runtime.

Cross-subdomain cookies

Better Auth's defaultCookieAttributes sets domain: .lilastudy.com in production (from env var COOKIE_DOMAIN). Without this, the auth cookie scoped to api.lilastudy.com wouldn't be sent on requests from lilastudy.com. The leading dot makes the cookie valid across all subdomains.


CI/CD

Forgejo Actions with SSH deploy (not webhooks, not manual)

CI builds images natively on the ARM64 VPS (no QEMU cross-compilation). The runner uses the host's Docker socket to build. After pushing to the registry, the workflow SSHs into the VPS to pull and restart containers. Webhooks were considered but add an extra listener service to maintain and secure. Manual deploy was the initial approach but doesn't scale with frequent pushes.

Dedicated CI SSH key

A separate ci-runner SSH key pair (not the developer's personal key) is used for CI deploys. The private key is stored in Forgejo's secrets. If compromised, only this key needs to be revoked — the developer's access is unaffected.

Runner config: docker_host: "automount" + valid_volumes + explicit config path

The Forgejo runner's automount setting mounts the host Docker socket into job containers. valid_volumes must include /var/run/docker.sock or the mount is blocked. The runner command must explicitly reference the config file (-c /data/config.yml) — without this flag, config changes are silently ignored. --group-add 989 in container options adds the host's docker group so job containers can access the socket.

Docker CLI installed per job (not baked into runner image)

The job container (node:24-bookworm) doesn't include Docker CLI. It's installed via apt-get install docker.io as the first workflow step. This adds ~20 seconds per run but avoids maintaining a custom runner image. The CLI sends commands through the mounted socket to the host's Docker engine.


Backups

pg_dump cron + dev laptop sync (not WAL archiving, not managed service)

Daily compressed SQL dumps with 7-day retention. Dev laptop auto-syncs new backups on login via rsync. Simple, portable, sufficient for current scale. WAL archiving gives point-in-time recovery but is complex to set up. Offsite storage (Hetzner Object Storage) is the planned next step — backups on the same VPS don't protect against VPS failure.