lila/documentation/ARCHITECTURE.md
2026-05-16 01:59:43 +02:00

229 lines
9.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Architecture
> How Lila is structured, how data flows, and why the boundaries are where they are.
---
## Monorepo Layout
```
lila/
├── apps/
│ ├── api/ — Express backend (HTTP + WebSocket)
│ └── web/ — React frontend (Vite, TanStack Router)
├── packages/
│ ├── shared/ — Zod schemas + constants (API/web contract)
│ └── db/ — Drizzle schema, migrations, models, seeding
├── data-pipeline/ — Kaikki extraction → enrichment → PostgreSQL sync
├── documentation/ — Project docs
├── Caddyfile — Reverse proxy routing
├── docker-compose.yml — Local dev stack
└── pnpm-workspace.yaml — Workspace definition
```
**Package boundaries:**
| Package | Owns | Consumed by |
| ----------------- | ----------------------------------------------------------------- | ------------------------------------- |
| `packages/shared` | Zod schemas, constants, derived TypeScript types | `apps/api`, `apps/web`, `packages/db` |
| `packages/db` | Drizzle schema, DB connection, all model/query functions | `apps/api` |
| `apps/api` | Router, controllers, services, error handling, WebSocket handlers | — |
| `apps/web` | React components, routes, client-side state | — |
**Rule:** `apps/api` never imports `drizzle-orm` for queries. It only calls functions exported from `packages/db`.
---
## Layered Architecture (HTTP)
```
HTTP Request
Router — maps URL + HTTP method to a controller
Controller — handles HTTP only: validates input (Zod safeParse),
calls service, sends response or next(error)
Service — business logic only: no HTTP, no direct DB access
Model — database queries only: no business logic
Database — PostgreSQL via Drizzle ORM
```
**The rule:** each layer only talks to the layer directly below it.
- **Controller** never touches the database.
- **Service** never reads `req.body`.
- **Model** never knows what a quiz is.
### Error Flow
```
Controller throws ValidationError (400) or calls next(error)
Central errorHandler middleware in app.ts
Maps AppError subclasses to HTTP status codes
Unknown errors → 500
```
---
## WebSocket Architecture
The WebSocket server is attached to the same Express HTTP server. It upgrades connections on the `/ws` path.
```
WS Connection Upgrade
Auth middleware — validates Better Auth session from cookie
Message Router — dispatches by `type` field (Zod discriminated union)
Handler (lobby or game) — business logic, broadcasts state
In-memory stores (lobby game state, game session state)
```
**Message protocol:** All WebSocket messages are validated against Zod schemas defined in `packages/shared/src/schemas/lobby.ts` and `packages/shared/src/schemas/game.ts`. The `type` field is a discriminated union — the router switches on it and validates the payload against the corresponding schema.
**State storage:**
- **Lobby membership** — stored in PostgreSQL (`lobbies`, `lobby_players` tables) for durability
- **Game/room state** — stored in-memory (`InMemoryLobbyGameStore`, `InMemoryGameSessionStore`). Valkey migration is planned.
---
## Database Schema (Core)
**Concept:** Words are language-neutral concepts (`terms`) with per-language `translations`. Adding a new language requires no schema changes — only new rows.
### Core Tables
| Table | Purpose |
| -------------- | -------------------------------------------------------------------------------- |
| `terms` | Language-neutral concept: `id`, `pos` (noun/verb/adj/adv), `source`, `source_id` |
| `translations` | Per-language word: `term_id` (FK), `language_code`, `text`, `cefr_level` (A1C2) |
| `term_glosses` | Per-language definition: `term_id` (FK), `language_code`, `text` |
| `decks` | Curated wordlists: `source_language`, `validated_languages`, frequency tier |
| `deck_terms` | Junction: which terms belong to which deck |
### Auth Tables (managed by Better Auth)
| Table | Purpose |
| -------------- | --------------------------------------------------------------------------------- |
| `user` | Account: `id`, `name`, `email`, `image` |
| `session` | Active sessions: `id`, `user_id`, `token`, `expires_at` |
| `account` | Social provider links: `user_id`, `provider` (google/github), `providerAccountId` |
| `verification` | Email verification tokens (unused for social-only auth) |
**Key constraints:**
- `language_code` is CHECK-constrained against `SUPPORTED_LANGUAGE_CODES` (`en`, `it`, `de`, `es`, `fr`)
- `pos` is CHECK-constrained against `SUPPORTED_POS` (`noun`, `verb`, `adjective`, `adverb`)
- `cefr_level` is nullable `varchar(2)` with CHECK `A1``C2`
- `translations` has UNIQUE `(term_id, language_code, text)` — allows synonyms, prevents exact duplicates
---
## Data Flow: Quiz Session
### Singleplayer
```
User clicks "Start Quiz"
POST /api/v1/game/start (GameRequestSchema: source_lang, target_lang, pos, difficulty, rounds)
gameController.validate → gameService.createGameSession
termModel.getGameTerms(filters) + termModel.getDistractors(filters)
Service shuffles options, stores session in GameSessionStore
Returns GameSession { sessionId, questions[] } — correct answer NEVER sent to frontend
User answers → POST /api/v1/game/answer (AnswerSubmissionSchema)
Service evaluates server-side, returns AnswerResult { isCorrect, correctOptionId, selectedOptionId }
```
### Multiplayer
```
Host creates lobby → POST /api/v1/lobbies → returns room code
Players join via code → POST /api/v1/lobbies/:code/join
All players connect WebSocket → send lobby:join with room code
Server broadcasts lobby:state (player list) to all connections in room
Host clicks "Start" → WS lobby:start
Server generates questions via MultiplayerGameService, broadcasts game:question
Players submit answers via WS game:answer within 15s server timer
On all-answered or timeout → evaluate, broadcast game:answer_result
After N rounds → broadcast game:finished with final scores
```
---
## The `packages/shared` Contract
`packages/shared` is the **single source of truth** for all data shapes crossing the API boundary.
**What lives here:**
- `constants.ts``SUPPORTED_LANGUAGE_CODES`, `SUPPORTED_POS`, `DIFFICULTY_LEVELS`, `CEFR_LEVELS`, `GAME_ROUNDS`
- `schemas/game.ts``GameRequestSchema`, `GameSessionSchema`, `GameQuestionSchema`, `AnswerOptionSchema`, `AnswerSubmissionSchema`, `AnswerResultSchema`
- `schemas/lobby.ts``LobbyCreateSchema`, `LobbyJoinSchema`, `LobbyStateSchema`, `WebSocketMessageSchema` (discriminated union)
- `schemas/auth.ts` — Auth-related shared types
**Why this matters:** If the shape changes, TypeScript compilation fails in both `apps/api` and `apps/web` simultaneously. Silent drift is impossible.
---
## GameSessionStore Abstraction
The service layer stores session state through an interface, not a concrete implementation:
```typescript
interface GameSessionStore {
createSession(session: GameSession): Promise<void>;
getSession(sessionId: string): Promise<GameSession | null>;
// ...
}
```
**Current:** `InMemoryGameSessionStore` — Map-based, lives in `apps/api` process memory. Lost on restart.
**Planned:** `ValkeyGameSessionStore` — Redis-compatible, persists across restarts, enables horizontal scaling.
The same pattern applies to `LobbyGameStore` (lobby state).
---
## Key Design Decisions (Quick Reference)
| Decision | Where it's explained |
| --------------------------------- | ----------------------------- |
| Why Drizzle over Prisma | `DECISIONS.md` → ORM |
| Why `ws` over Socket.io | `DECISIONS.md` → WebSocket |
| Why server-side answer evaluation | `DECISIONS.md` → Architecture |
| Why Better Auth over Keycloak | `DECISIONS.md` → Auth |
| Why terms/translations schema | `DECISIONS.md` → Data Model |
| Why Caddy over Nginx/Traefik | `DECISIONS.md` → Deployment |
---
## Further Reading
- [DATA_PIPELINE.md](DATA_PIPELINE.md) — How vocabulary data gets from Kaikki into PostgreSQL
- [DEPLOYMENT.md](DEPLOYMENT.md) — Production infrastructure and ops
- [MODEL_STRATEGY.md](MODEL_STRATEGY.md) — LLM voter architecture for CEFR assignment
- [design/GAME_MODES.md](design/GAME_MODES.md) — Planned multiplayer modes