lila/documentation/ai-context/00-project-overview.md
2026-05-16 01:59:43 +02:00

116 lines
6.8 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# 00 — Project Overview
> **Purpose:** Give any LLM instant context on what Lila is, what makes it different, and what's currently built vs. planned. Concatenate this file with domain-specific files (0106) and 99-current-task.md before handing to an LLM.
> **Last updated:** 2026-05-15
> **Depends on:** Nothing (this is the entry point)
---
## What Lila Is
Lila is a vocabulary learning app with two core differentiators:
1. **Media-based practice** — Users learn vocabulary extracted from real media they love: a Shakira song, the first chapter of _Harry Potter_, an episode of _Breaking Bad_. The app extracts vocabulary from subtitles/lyrics/text and turns it into quiz questions.
2. **Multiplayer modes** — Users practice vocabulary together or competitively in real-time sessions (24 players, simultaneous answers, live scoring).
The core learning loop is Duolingo-style: a word appears in one language, the user picks the correct translation from four choices.
Live at [lilastudy.com](https://lilastudy.com).
---
## Current State (2026-05-15)
### What Works Today
- **Singleplayer quiz** — 5 language pairs (en↔it/de/es/fr), 3 or 10 rounds, POS + difficulty filters
- **Multiplayer** — Create/join lobby by room code, 24 players, simultaneous answers, 15s server timer, live scoring, winner screen
- **Auth** — Google + GitHub via Better Auth
- **Deployment** — Live on Hetzner VPS, Caddy HTTPS, Docker Compose, CI/CD via Forgejo Actions
- **Database** — PostgreSQL with Drizzle ORM, daily backups
### What's In Progress / Blocked
- **Kaikki data pipeline migration** — Replacing OpenWordNet/OMW with sense-disambiguated Kaikki data. Stage 1 (extract) and Stage 2 (reverse link) complete on sample data. Stage 3 (enrich) being rewritten for sub-stage architecture.
- **Guest play** — No try-before-signup flow yet. Auth required for all game routes.
- **Game session store** — Still in-memory. Valkey container exists locally but not wired up.
- **Media ingestion** — Not started. No pipeline for subtitles/lyrics → vocab extraction yet.
### The Strategic Gap
The app is currently a **generic vocabulary quiz**. The media-based practice feature (the differentiator) does not exist yet. It depends on:
1. Kaikki pipeline reaching production (fixes translation quality)
2. A media ingestion prototype (subtitles/lyrics → text → vocab extraction → quiz)
---
## Tech Stack
| Layer | Technology |
| ------------- | -------------------------------------------------------------- |
| Monorepo | pnpm workspaces |
| Frontend | React 18, Vite, TanStack Router, TanStack Query, Tailwind CSS |
| Backend | Node.js, Express, TypeScript, WebSockets (`ws` library) |
| Database | PostgreSQL + Drizzle ORM |
| Auth | Better Auth (Google + GitHub) |
| Validation | Zod (shared between frontend and backend in `packages/shared`) |
| Testing | Vitest, supertest |
| Deployment | Docker Compose, Caddy, Hetzner VPS |
| CI/CD | Forgejo Actions |
| Data Pipeline | Kaikki (Wiktionary) → SQLite (`pipeline.db`) → PostgreSQL |
---
## Repository Structure
```
lila/
├── apps/
│ ├── api/ — Express backend (HTTP + WebSocket)
│ └── web/ — React frontend (Vite, TanStack Router)
├── packages/
│ ├── shared/ — Zod schemas + constants (API/web contract)
│ └── db/ — Drizzle schema, migrations, models, seeding
├── data-pipeline/ — Kaikki extraction → enrichment → PostgreSQL sync
└── documentation/ — Project docs (human + AI-context branches)
```
**Key rule:** `packages/shared` is the single source of truth for all data shapes crossing the API boundary. Both frontend and backend import from it. If a schema changes, TypeScript compilation fails in both places simultaneously.
---
## Key Architecture Principles
1. **Layered architecture** — Router → Controller → Service → Model → Database. Each layer only talks to the layer below it.
2. **Server-side answer evaluation** — The correct answer is never sent to the frontend. All evaluation happens server-side.
3. **Zod discriminated unions for WebSockets** — All WS messages are typed via Zod schemas in `packages/shared`. The router switches on the `type` field.
4. **GameSessionStore abstraction** — Session state is stored through an interface (`InMemoryGameSessionStore` now, `ValkeyGameSessionStore` planned).
5. **Language-neutral data model**`terms` are concepts; `translations` are per-language words. Adding a language requires no schema changes.
---
## Key Decisions (Summary)
| Topic | Decision | Why |
| ----------- | --------------------------- | --------------------------------------------- |
| ORM | Drizzle, not Prisma | No binary, no engine, closer to SQL |
| WebSocket | `ws` library, not Socket.io | 24 players, explicit Zod protocol sufficient |
| Auth | Better Auth, not Keycloak | Embedded middleware, no separate service |
| Answer eval | Server-side only | Correct answer never sent to frontend |
| Data source | Kaikki, not OMW | Sense-disambiguated translations |
---
## Further Reading (AI-Context Files)
| File | What it covers |
| ---------------------------------------------------- | ------------------------------------------------------------ |
| [01-architecture.md](01-architecture.md) | Monorepo structure, layered architecture, data flow diagrams |
| [02-data-model.md](02-data-model.md) | Database schema, tables, relationships, constraints |
| [03-api-contract.md](03-api-contract.md) | REST endpoints, request/response schemas, Zod types |
| [04-websocket-protocol.md](04-websocket-protocol.md) | WS message types, game flow, auth, state management |
| [05-data-pipeline.md](05-data-pipeline.md) | Kaikki pipeline stages, enrich sub-stages, sync |
| [06-deployment.md](06-deployment.md) | Docker, Caddy, CI/CD, backups |
| [prompts/meta.md](prompts/meta.md) | How to work with LLMs on this codebase |
| [99-current-task.md](99-current-task.md) | Template: fill this out before giving a task to an LLM |