00 — Project Overview

Purpose: Give any LLM instant context on what Lila is, what makes it different, and what's currently built vs. planned. Concatenate this file with domain-specific files (01–06) and 99-current-task.md before handing to an LLM. Last updated: 2026-05-15 Depends on: Nothing (this is the entry point)

What Lila Is

Lila is a vocabulary learning app with two core differentiators:

Media-based practice — Users learn vocabulary extracted from real media they love: a Shakira song, the first chapter of Harry Potter, an episode of Breaking Bad. The app extracts vocabulary from subtitles/lyrics/text and turns it into quiz questions.
Multiplayer modes — Users practice vocabulary together or competitively in real-time sessions (2–4 players, simultaneous answers, live scoring).

The core learning loop is Duolingo-style: a word appears in one language, the user picks the correct translation from four choices.

Live at lilastudy.com.

Current State (2026-05-15)

What Works Today

Singleplayer quiz — 5 language pairs (en↔it/de/es/fr), 3 or 10 rounds, POS + difficulty filters
Multiplayer — Create/join lobby by room code, 2–4 players, simultaneous answers, 15s server timer, live scoring, winner screen
Auth — Google + GitHub via Better Auth
Deployment — Live on Hetzner VPS, Caddy HTTPS, Docker Compose, CI/CD via Forgejo Actions
Database — PostgreSQL with Drizzle ORM, daily backups

What's In Progress / Blocked

Kaikki data pipeline migration — Replacing OpenWordNet/OMW with sense-disambiguated Kaikki data. Stage 1 (extract) and Stage 2 (reverse link) complete on sample data. Stage 3 (enrich) being rewritten for sub-stage architecture.
Guest play — No try-before-signup flow yet. Auth required for all game routes.
Game session store — Still in-memory. Valkey container exists locally but not wired up.
Media ingestion — Not started. No pipeline for subtitles/lyrics → vocab extraction yet.

The Strategic Gap

The app is currently a generic vocabulary quiz. The media-based practice feature (the differentiator) does not exist yet. It depends on:

Kaikki pipeline reaching production (fixes translation quality)
A media ingestion prototype (subtitles/lyrics → text → vocab extraction → quiz)

Tech Stack

Layer	Technology
Monorepo	pnpm workspaces
Frontend	React 18, Vite, TanStack Router, TanStack Query, Tailwind CSS
Backend	Node.js, Express, TypeScript, WebSockets (`ws` library)
Database	PostgreSQL + Drizzle ORM
Auth	Better Auth (Google + GitHub)
Validation	Zod (shared between frontend and backend in `packages/shared`)
Testing	Vitest, supertest
Deployment	Docker Compose, Caddy, Hetzner VPS
CI/CD	Forgejo Actions
Data Pipeline	Kaikki (Wiktionary) → SQLite (`pipeline.db`) → PostgreSQL

Repository Structure

lila/
├── apps/
│   ├── api/              — Express backend (HTTP + WebSocket)
│   └── web/              — React frontend (Vite, TanStack Router)
├── packages/
│   ├── shared/           — Zod schemas + constants (API/web contract)
│   └── db/               — Drizzle schema, migrations, models, seeding
├── data-pipeline/        — Kaikki extraction → enrichment → PostgreSQL sync
└── documentation/        — Project docs (human + AI-context branches)

Key rule: packages/shared is the single source of truth for all data shapes crossing the API boundary. Both frontend and backend import from it. If a schema changes, TypeScript compilation fails in both places simultaneously.

Key Architecture Principles

Layered architecture — Router → Controller → Service → Model → Database. Each layer only talks to the layer below it.
Server-side answer evaluation — The correct answer is never sent to the frontend. All evaluation happens server-side.
Zod discriminated unions for WebSockets — All WS messages are typed via Zod schemas in packages/shared. The router switches on the type field.
GameSessionStore abstraction — Session state is stored through an interface (InMemoryGameSessionStore now, ValkeyGameSessionStore planned).
Language-neutral data model — terms are concepts; translations are per-language words. Adding a language requires no schema changes.

Key Decisions (Summary)

Topic	Decision	Why
ORM	Drizzle, not Prisma	No binary, no engine, closer to SQL
WebSocket	`ws` library, not Socket.io	2–4 players, explicit Zod protocol sufficient
Auth	Better Auth, not Keycloak	Embedded middleware, no separate service
Answer eval	Server-side only	Correct answer never sent to frontend
Data source	Kaikki, not OMW	Sense-disambiguated translations

File	What it covers
01-architecture.md	Monorepo structure, layered architecture, data flow diagrams
02-data-model.md	Database schema, tables, relationships, constraints
03-api-contract.md	REST endpoints, request/response schemas, Zod types
04-websocket-protocol.md	WS message types, game flow, auth, state management
05-data-pipeline.md	Kaikki pipeline stages, enrich sub-stages, sync
06-deployment.md	Docker, Caddy, CI/CD, backups
prompts/meta.md	How to work with LLMs on this codebase
99-current-task.md	Template: fill this out before giving a task to an LLM

6.8 KiB Raw Permalink Blame History Unescape Escape