updating documentation

This commit is contained in:
lila 2026-05-16 01:59:43 +02:00
parent 1ba57c7e9d
commit 7e0311683f
25 changed files with 2660 additions and 226 deletions

View file

@ -0,0 +1,336 @@
# Lila — Feature & Startup Strategy Roadmap
> **Context for any LLM reading this:** Lila is a language learning/vocabulary app with two core differentiators: (1) **media-based practice** — users learn vocabulary extracted from real media they love (e.g., a Shakira song, the first chapter of _Harry Potter_, or an episode of _Breaking Bad_), and (2) **multiplayer modes** — users practice vocabulary together or competitively in real-time sessions. The app is currently at an early MVP stage. The existing MVP was built around OpenWordNet, which is being replaced because it produces unreliable translations (sense-disambiguation issues). The team is migrating the data pipeline to **Kaikki**, which structures entries per word sense and links translations to specific senses rather than vague general concepts. This migration is the current technical priority. The project is a TypeScript monorepo (pnpm workspaces) with an Express/WebSocket API (`apps/api`), a React frontend using TanStack Router (`apps/web`), a data ingestion pipeline (`data-pipeline`) backed by SQLite/Drizzle, shared packages (`packages/db`, `packages/shared`), and Docker-based deployment orchestrated with Caddy. Documentation restructuring (human-readable vs. AI-optimized docs) is being handled in a separate parallel workstream.
---
## Current State (Ground Truth — 2026-05-15)
### What Works Today ✅
- **Singleplayer quiz** — Duolingo-style, 5 language pairs (en↔it/de/es/fr), 3 or 10 rounds, POS + difficulty filters
- **Multiplayer** — Create/join lobby by room code, 24 players, simultaneous answers, 15s server timer, live scoring, winner screen
- **Auth** — Google + GitHub via Better Auth
- **Deployment** — Live at lilastudy.com, Hetzner VPS, Caddy HTTPS, Docker Compose, CI/CD via Forgejo Actions
- **Database** — PostgreSQL with Drizzle ORM, daily backups
### What's In Progress / Blocked 🚧
- **Kaikki data pipeline migration** — Stage 1 (extract) and Stage 2 (reverse link) complete on sample data. Stage 3 (enrich) being rewritten for sub-stage architecture. Stages 46 not started.
- **Guest play** — No try-before-signup flow yet. Auth required for all game routes.
- **Game session store** — Still in-memory. Valkey container exists locally but not wired up.
- **Media ingestion** — Not started. No pipeline for subtitles/lyrics → vocab extraction yet.
### The Strategic Gap
The app is currently a **generic vocabulary quiz**. The media-based practice feature (the differentiator) does not exist yet. It depends on:
1. Kaikki pipeline reaching production (fixes translation quality)
2. A media ingestion prototype (subtitles/lyrics → text → vocab extraction → quiz)
---
## Stream 1: Documentation Restructure (Parallel Track)
**Status:** ✅ Complete. Human-readable branch (README, STATUS, ARCHITECTURE, BACKLOG, DECISIONS, DEPLOYMENT, DATA_PIPELINE, MODEL_STRATEGY, LLM_SETUP, design/GAME_MODES) and AI-context branch (0006, prompts/meta.md, 99-current-task.md) are live in `documentation/`.
---
## Stream 2: Feature Roadmap (Three Lanes)
### Lane A — Attract & Keep Users
**Goal:** A user lands on Lila, understands the value in 10 seconds, and completes a satisfying vocabulary practice session in under 2 minutes.
**Current Reality Check:**
- Singleplayer and multiplayer quizzes are **already working and deployed**.
- The app is functional but **not differentiated** — it's a generic vocabulary quiz right now.
- The "wow" moment requires the **media-based practice feature**, which does not exist yet.
**Must-Haves for First Users:**
1. **Guest Play (Zero-Friction Onboarding)** `[in backlog next]`
- No signup required for first session.
- Capture email or OAuth only after the user experiences value.
- Critical for viral loops and investor demos.
- **Status:** Planned in BACKLOG.md. Not yet implemented.
2. **One Polished Media Demo** `[not started]`
- Pick **ONE** piece of media and make it flawless end-to-end: subtitles/lyrics → Kaikki-based vocab extraction with sense-disambiguated translations → playable quiz with timestamps/context.
- Candidates: _Breaking Bad S01E01_, a Shakira song, or _Harry Potter and the Sorcerer's Stone Ch. 1_.
- This is the primary "wow" moment. Differentiates Lila from all other vocabulary apps.
- **Blocker:** Requires (a) Kaikki pipeline in production, and (b) a media ingestion prototype.
3. **One Additional Multiplayer Mode** `[design exists, not implemented]`
- Proves the mode-agnostic lobby architecture works and adds variety beyond the current simultaneous-answer flow.
- **Recommended first mode:** Race to the Top (target score, no round limit) — simplest to implement, changes only scoring logic.
- Alternative: TV Quiz Show (buzzer — first to press answers) — most visually distinct, but requires new answer flow.
- **Status:** Lobby infrastructure is mode-agnostic. Each mode adds game logic only. See `design/GAME_MODES.md` for full designs.
- **Why it matters:** Duolingo has no multiplayer. Anki has no multiplayer. Real-time modes are a genuine differentiator even without media.
4. **Social Proof / Shareable Output** `[not started]`
- Post-game card: "I learned 12 words from _La Tortura_ — can you beat my score?"
- Image export or copy-paste text for Reddit, Discord, Twitter.
- This is the organic growth engine.
- **Blocker:** Requires media demo to exist first.
**Already Shipped (Don't Rebuild):**
- ✅ Singleplayer quiz (5 languages, POS/difficulty filters)
- ✅ Multiplayer lobby + real-time game (24 players, simultaneous answers, 15s timer, scoring)
- ✅ Auth (Google + GitHub)
- ✅ Live deployment with CI/CD
**Nice-to-Haves (Post-Launch):**
- Additional multiplayer modes (Chain Link, Elimination Round, Cooperative Challenge)
- Leaderboards
- Spaced repetition review queue
- Additional game modes (see design/GAME_MODES.md)
---
### Lane B — Investor-Ready
**Goal:** Walk into a pitch with engagement metrics and a defensibility story tied to Lila's unique data pipeline.
**Checklist:**
1. **Metrics Instrumentation** `[not started]`
- Track: DAU/MAU, session length, quiz completion rate, multiplayer match completion rate, Day 1 / Day 7 retention.
- Tool: PostHog, Mixpanel, or Plausible (self-hosted).
- Need 46 weeks of real-user data.
- **Note:** The app is live but has no analytics. This is a prerequisite for any investor conversation.
2. **Growth Mechanic** `[not started]`
- The shareable card (Lane A.3) must be live and instrumented.
- Measure k-factor (viral coefficient). Even 0.3 is a story.
- **Blocker:** Requires media demo.
3. **Defensibility Story** `[partially true, not yet proven]`
- **Data moat:** Lila's Kaikki → media mapping pipeline produces sense-disambiguated vocabulary tied to specific media timestamps. Competitors using generic word lists or OpenWordNet-style dumps cannot match the precision.
- **Current reality:** The Kaikki pipeline exists but is not in production. The media mapping pipeline does not exist yet.
- **What investors would ask:** "You have a quiz app. Where's the media feature you pitched?"
- **Requirement:** Media demo + Kaikki production data must be live before investor conversations.
4. **Monetization Hypothesis** `[not tested]`
- Pick ONE model to test:
- **Freemium:** Free media, premium for user uploads / unlimited multiplayer / advanced analytics.
- **B2B:** Schools and language institutes buy group licenses.
- **Affiliate:** Deep-link to streaming services, books, or music platforms.
- Don't implement yet, but explain LTV/CAC math and pricing assumptions.
**Investor Timeline:**
- **Now → Month 2:** Finish Kaikki pipeline + ship media demo + add metrics.
- **Month 23:** Soft launch to 100 strangers, gather retention data.
- **Month 3+:** Investor-ready if retention curves look good.
---
### Lane C — Co-Founder-Ready
**Goal:** A potential co-founder looks at Lila and thinks, "This person can build, and there's a real product here."
**Checklist:**
1. **Clean Codebase + Documentation** `[in progress]`
- Documentation restructure is complete.
- README must get a new dev from `git clone` to `docker compose up` in < 5 minutes.
- **Status:** Docs are done. Code cleanliness is ongoing (BACKLOG.md `next`/`later` items).
2. **Live Demo with Real Users** `[partially done]`
- App is live at lilastudy.com with real auth and multiplayer.
- **Gap:** No real users yet. The current app is a generic quiz — not compelling enough for strangers to stick around.
- **Requirement:** Media demo must be live before pitching to potential co-founders.
3. **Clear Vision Doc** `[not written]`
- 1-page: What Lila is, what it isn't, and the 18-month arc.
- Include: target languages, target media types, target user persona, and what "success" looks like at 6 / 12 / 18 months.
---
## Stream 3: Building the Startup (Technical Founder Journey)
### Phase 1 — Differentiate the MVP (Now → Month 23)
**Duration:** 23 months
**Rule:** Do NOT look for a co-founder yet.
**Why:** The MVP is already built and deployed. What's missing is the **differentiating feature** (media-based practice). A co-founder won't help you build this faster — it's a technical/data problem. Also, you need leverage: "I built the MVP AND the media pipeline" is stronger than "I have an idea for a media pipeline."
**Tasks:**
1. **Finish Kaikki pipeline** (Stage 36)
- Complete enrich sub-stage rewrite
- Run full sample, validate quality
- Production sync to PostgreSQL
- **Timeline:** 24 weeks
2. **Build media ingestion prototype**
- Pick ONE media piece (Breaking Bad S01E01, Shakira song, or Harry Potter Ch. 1)
- Pipeline: subtitles/lyrics → text extraction → vocabulary identification → Kaikki sense-matching → quiz generation
- UI: media selection → quiz with context ("This word appears at 00:04:23")
- **Timeline:** 24 weeks (parallel with Kaikki pipeline)
3. **Ship guest play**
- Make auth optional on game routes
- "Try without account" button on landing page
- Capture email/OAuth after first session
- **Timeline:** 1 week
4. **Add metrics instrumentation**
- PostHog or Plausible
- Track: signups, quiz starts, completions, multiplayer matches, retention
- **Timeline:** 1 week
5. **Soft launch to 100 strangers**
- Reddit (r/languagelearning, r/Anki), language-learning Discords, Hacker News Show HN
- Collect qualitative feedback
- **Timeline:** 1 week (after media demo is live)
---
### Phase 2 — Validate & Measure (Month 24)
**Goal:** Prove that the media feature resonates and that retention curves exist.
**Tasks:**
- Analyze metrics: Do users who try media-based practice return more than singleplayer-only users?
- Iterate on media selection and quiz UX based on feedback
- Polish shareable output (social cards)
- Fix hardening items from BACKLOG.md Phase 7
**Decision gate:** If 100 users show positive retention signals (Day 1 > 30%, Day 7 > 10%), proceed to Phase 3. If not, iterate on media feature or pivot.
---
### Phase 3 — Define the Gap (Month 35)
**Goal:** Identify exactly what you suck at or hate doing.
**The wrong approach:** "I need an MBA."
**The right approach:** "I need someone who has done [specific thing] before."
**Common gaps for technical founders:**
| Gap | Profile | What They Do |
|-----|---------|--------------|
| Fundraising | Former founder who raised Seed/Series A | Writes deck, runs investor meetings, handles term sheets |
| Monetization | Product/Growth PM from ed-tech | Designs pricing, runs experiments, builds B2B pipeline |
| Partnerships | BD person from media/streaming | Negotiates content deals, affiliate partnerships |
| Operations | COO-type | Runs hiring, finance, legal, day-to-day ops |
| Marketing | Growth marketer | Runs paid/organic acquisition, community building |
**Your job:** After 100 users, the gap becomes obvious. If no one converts to signup, you need growth/marketing. If schools email asking for licenses, you need BD/monetization. If investors ask questions you can't answer, you need a fundraising co-founder.
---
### Phase 4 — Co-Founder Search (Targeted, Month 46)
**Goal:** Find 23 candidates, work with them on a trial basis.
**Where to look:**
- **Founder dating events:** YC Co-Founder Matching, Indie Hackers meetups, local accelerators.
- **Angel investor intros:** Ask any angel you meet for intros to founders they backed who might want a new project.
- **Industry communities:** Ed-tech Slack/Discord groups, language-learning subreddits (look for people complaining about Duolingo — they care).
- **LinkedIn outbound:** Search "former PM at Duolingo," "former growth at Babbel." Cold DM with the Lila demo, not a resume.
**Trial period (48 weeks):**
- Work together on a concrete project (e.g., "Design and test a monetization experiment").
- No equity commitment. Pay as a contractor if needed.
- Evaluate: Do they deliver? Do you communicate well? Do they care about the mission?
---
### Phase 5 — Formalize (Only After Trial, Month 6+)
**Goal:** Legal structure, equity split, roles.
**Equity mindset:**
- **50/50 is dangerous** unless you truly could not build without them from day one. You already built the Lila MVP alone — you have leverage.
- **Suggested:** 60/40 or 65/35 with 4-year vesting and a 1-year cliff for both.
- **Vesting is non-negotiable.** If they leave in 6 months, they keep nothing.
**Roles:**
- You: CTO / Product (own technical vision, architecture, data pipeline).
- Them: CEO / COO / CMO depending on profile (own business side, fundraising, partnerships).
- Decision-making: You retain veto on technical/product decisions; they lead business/fundraising.
**Legal:**
- Incorporate properly (C-Corp if US, Ltd if UK/EU).
- IP assignment agreement: everything built so far belongs to the company.
- Founder agreement: roles, vesting, termination, dispute resolution.
---
## Suggested Execution Order
### Month 1 (Now)
- **Week 12:** Finish Kaikki Stage 3 enrich sub-stage rewrite. Run full sample, validate quality. Start first additional multiplayer mode (Race to the Top recommended).
- **Week 3:** Ship guest play. Make auth optional on game routes.
- **Week 4:** Start media ingestion prototype (parallel). Pick one media piece, get text extraction working.
### Month 2
- **Week 12:** Complete media ingestion prototype. End-to-end: media → quiz. Complete first additional multiplayer mode.
- **Week 3:** Add metrics (PostHog/Plausible). Polish shareable output (social cards).
- **Week 4:** Integrate media demo with multiplayer mode. Test combined flow.
### Month 3
- **Week 1:** Soft launch to 100 strangers. Gather feedback.
- **Week 23:** Iterate based on feedback. Fix hardening items from BACKLOG.md.
- **Week 4:** Analyze metrics. Decision gate: proceed or iterate?
### Month 46
- If metrics are positive: start co-founder search (Phase 4).
- If metrics are weak: iterate on media feature or pivot the value proposition.
---
## Open Questions to Refine This Roadmap
Answer these to make the roadmap more specific:
### Product Reality Check
- [ ] What is the exact blocker on Kaikki Stage 3? (Is it the sub-stage rewrite, or something else?)
- [ ] Which media piece should be the first demo? (Breaking Bad, Shakira, or Harry Potter?)
- [ ] Do you have subtitle/lyrics data for the chosen media piece?
### Target Audience
- [ ] Who is the ideal first user? (e.g., "German intermediate learner who watches Netflix")
- [ ] What language pair should the media demo target?
### Business Model
- [ ] Do you have a monetization hypothesis? (freemium / B2B / affiliate)
- [ ] Any thoughts on unit economics?
### Competitive Landscape
- [ ] Who do you see as direct competitors? (LingQ? FluentU? Duolingo? Anki?)
- [ ] What do they do poorly that Lila fixes?
### Runway & Constraints
- [ ] Is this full-time or nights-and-weekends?
- [ ] Do you have any funding, savings runway, or revenue?
- [ ] What's your hard deadline for "showable to users"?
### Co-Founder Search
- [ ] Local or remote-first?
- [ ] What do you want them to _do_? (fundraise, partnerships, monetization, operations?)
- [ ] Do you already know candidates, or starting from zero?
- [ ] Equity mindset: 50/50, or majority control for yourself?