lila/documentation/ai-context/00-project-overview.md
2026-05-16 01:59:43 +02:00

6.8 KiB
Raw Permalink Blame History

00 — Project Overview

Purpose: Give any LLM instant context on what Lila is, what makes it different, and what's currently built vs. planned. Concatenate this file with domain-specific files (0106) and 99-current-task.md before handing to an LLM. Last updated: 2026-05-15 Depends on: Nothing (this is the entry point)


What Lila Is

Lila is a vocabulary learning app with two core differentiators:

  1. Media-based practice — Users learn vocabulary extracted from real media they love: a Shakira song, the first chapter of Harry Potter, an episode of Breaking Bad. The app extracts vocabulary from subtitles/lyrics/text and turns it into quiz questions.
  2. Multiplayer modes — Users practice vocabulary together or competitively in real-time sessions (24 players, simultaneous answers, live scoring).

The core learning loop is Duolingo-style: a word appears in one language, the user picks the correct translation from four choices.

Live at lilastudy.com.


Current State (2026-05-15)

What Works Today

  • Singleplayer quiz — 5 language pairs (en↔it/de/es/fr), 3 or 10 rounds, POS + difficulty filters
  • Multiplayer — Create/join lobby by room code, 24 players, simultaneous answers, 15s server timer, live scoring, winner screen
  • Auth — Google + GitHub via Better Auth
  • Deployment — Live on Hetzner VPS, Caddy HTTPS, Docker Compose, CI/CD via Forgejo Actions
  • Database — PostgreSQL with Drizzle ORM, daily backups

What's In Progress / Blocked

  • Kaikki data pipeline migration — Replacing OpenWordNet/OMW with sense-disambiguated Kaikki data. Stage 1 (extract) and Stage 2 (reverse link) complete on sample data. Stage 3 (enrich) being rewritten for sub-stage architecture.
  • Guest play — No try-before-signup flow yet. Auth required for all game routes.
  • Game session store — Still in-memory. Valkey container exists locally but not wired up.
  • Media ingestion — Not started. No pipeline for subtitles/lyrics → vocab extraction yet.

The Strategic Gap

The app is currently a generic vocabulary quiz. The media-based practice feature (the differentiator) does not exist yet. It depends on:

  1. Kaikki pipeline reaching production (fixes translation quality)
  2. A media ingestion prototype (subtitles/lyrics → text → vocab extraction → quiz)

Tech Stack

Layer Technology
Monorepo pnpm workspaces
Frontend React 18, Vite, TanStack Router, TanStack Query, Tailwind CSS
Backend Node.js, Express, TypeScript, WebSockets (ws library)
Database PostgreSQL + Drizzle ORM
Auth Better Auth (Google + GitHub)
Validation Zod (shared between frontend and backend in packages/shared)
Testing Vitest, supertest
Deployment Docker Compose, Caddy, Hetzner VPS
CI/CD Forgejo Actions
Data Pipeline Kaikki (Wiktionary) → SQLite (pipeline.db) → PostgreSQL

Repository Structure

lila/
├── apps/
│   ├── api/              — Express backend (HTTP + WebSocket)
│   └── web/              — React frontend (Vite, TanStack Router)
├── packages/
│   ├── shared/           — Zod schemas + constants (API/web contract)
│   └── db/               — Drizzle schema, migrations, models, seeding
├── data-pipeline/        — Kaikki extraction → enrichment → PostgreSQL sync
└── documentation/        — Project docs (human + AI-context branches)

Key rule: packages/shared is the single source of truth for all data shapes crossing the API boundary. Both frontend and backend import from it. If a schema changes, TypeScript compilation fails in both places simultaneously.


Key Architecture Principles

  1. Layered architecture — Router → Controller → Service → Model → Database. Each layer only talks to the layer below it.
  2. Server-side answer evaluation — The correct answer is never sent to the frontend. All evaluation happens server-side.
  3. Zod discriminated unions for WebSockets — All WS messages are typed via Zod schemas in packages/shared. The router switches on the type field.
  4. GameSessionStore abstraction — Session state is stored through an interface (InMemoryGameSessionStore now, ValkeyGameSessionStore planned).
  5. Language-neutral data modelterms are concepts; translations are per-language words. Adding a language requires no schema changes.

Key Decisions (Summary)

Topic Decision Why
ORM Drizzle, not Prisma No binary, no engine, closer to SQL
WebSocket ws library, not Socket.io 24 players, explicit Zod protocol sufficient
Auth Better Auth, not Keycloak Embedded middleware, no separate service
Answer eval Server-side only Correct answer never sent to frontend
Data source Kaikki, not OMW Sense-disambiguated translations

Further Reading (AI-Context Files)

File What it covers
01-architecture.md Monorepo structure, layered architecture, data flow diagrams
02-data-model.md Database schema, tables, relationships, constraints
03-api-contract.md REST endpoints, request/response schemas, Zod types
04-websocket-protocol.md WS message types, game flow, auth, state management
05-data-pipeline.md Kaikki pipeline stages, enrich sub-stages, sync
06-deployment.md Docker, Caddy, CI/CD, backups
prompts/meta.md How to work with LLMs on this codebase
99-current-task.md Template: fill this out before giving a task to an LLM