updating documentation

2026-04-28 17:26:01 +02:00 · 2026-04-28 17:26:01 +02:00 · fd9667c1fd
commit fd9667c1fd
parent 98c59f33c5
2 changed files with 351 additions and 6 deletions
--- a/documentation/backlog.md
+++ b/documentation/backlog.md
@ -8,15 +8,9 @@ Labels: `[feature]` `[infra]` `[security]` `[ux]` `[debt]`
 Things that are actively in progress or should be picked up immediately. Mostly operational risk and the remaining phase 7 hardening work.
 - **Google OAuth publishing** `[infra]`
  Only test users can currently log in via Google. Publish the OAuth consent screen so any Google user can sign in — requires branding verification in Google Cloud Console.
 - **Hetzner domain migration check** `[infra]`
  Verify whether the lilastudy.com domain needs to be migrated following a Hetzner DNS change. Check Hetzner dashboard for any pending migration notice.
 - **Conditionally register OAuth providers** `[debt]`
  Better Auth logs warnings when social providers are registered without credentials (`Social provider google is missing clientId or clientSecret`). Instead of registering all providers unconditionally, only add a provider to the config when its credentials are present in the environment. Keeps local dev clean for contributors who don't have OAuth apps set up.
 ---
 ## next
@ -69,6 +63,9 @@ Clearly planned work, not yet started. No hard ordering — sequence based on wh
 - **Tighten CSP to remove unsafe-inline** `[security]`
  Current script-src uses 'unsafe-inline' to accommodate framework-injected inline scripts (likely TanStack Router hydration). Tightening this would require nonce-based CSP, which needs server-rendered HTML or a Caddy layer that injects per-request nonces. Not urgent — pragmatic CSP with 'unsafe-inline' is mainstream for SPAs at this scale. Revisit if the app handles more sensitive data or grows a meaningful user base
 - **Publish Google OAuth consent screen** `[infra]`
  App is currently in testing mode, which caps OAuth sign-ins at 100 users. Before hitting that limit, publish the consent screen in Google Cloud Console. Basic scopes (email, profile, openid) require no Google review — just fill in branding fields (app name, logo, support email, privacy policy URL) and click publish. Trigger: do this before reaching 80 users.
 ---
 ## later
--- a/documentation/roasts/gameService.md
+++ b/documentation/roasts/gameService.md
@ -0,0 +1,348 @@
 # 🔥 GameService Roast: `apps/api/src/services/gameService.ts`
 > *"It works on my machine" is not a scalability strategy.*
 **Project:** lila — Vocabulary Trainer  
 **File Roasted:** `gameService.ts`  
 **Date:** $(date)  
 **Roaster:** Qwen3.6  
 ---
 ## 📋 Executive Summary
 | Metric        | Score    | Notes                                                |
 | ------------- | -------- | ---------------------------------------------------- |
 | Code Quality  | 8/10     | Clean layering, good types, consistent style         |
 | Correctness   | 6/10     | Race condition + N+1 query are critical              |
 | Test Coverage | 7/10     | Good happy-path tests, missing concurrency tests     |
 | Scalability   | 5/10     | Will choke at ~100 concurrent users without fixes    |
 | **Overall**   | **7/10** | Solid foundation, but fix the footguns before launch |
 ---
 ## 🚨 Critical Issues (Fix Before Production)
 ### 1. Race Condition: Lost Update in `evaluateAnswer`
 **Location:** `gameService.ts:45-58` + `InMemoryGameSessionStore.ts:update()`
 // Current flow (VULNERABLE):
 const session = await store.get(submission.sessionId);  // READ
 const updatedAnswers = new Map(session.answers);         // MODIFY (local copy)
 updatedAnswers.delete(submission.questionId);
 await store.update(submission.sessionId, { answers: updatedAnswers }); // WRITE
 The Attack:
    Client submits answer A and answer B for the same question (network retry, bug, or malice)
    Both requests read the same session.answers Map (question still present)
    Both delete the question from their local copy
    Both write back → second write overwrites first
    Result: One answer is silently lost, session state desyncs
 Why Tests Missed It: Vitest runs tests synchronously. Race conditions require deliberate concurrency testing.
 Fix Options:
 // Option A: Add atomic operation to store interface
 interface GameSessionStore {
  deleteAnswer(sessionId: string, questionId: string): Promise<boolean>;
 }
 // Option B: Use Valkey Lua script for atomic read-modify-write
 // Option C: Optimistic locking with version numbers
 Priority: 🔴 CRITICAL — Data integrity issue
 2. N+1 Query: Database Performance Bomb
 Location: gameService.ts:24-26 + termModel.ts:getDistractors()
 // For each of N terms, we call getDistractors():
 const questions: GameQuestion[] = await Promise.all(
  terms.map(async (term) => {
    const distractorTexts = await getDistractors(term.termId, ...); // 🚩 N queries!
  })
 );
 Impact Analysis:
 Rounds
 DB Queries
 At 50 concurrent users
 3
 1 + 3 = 4
 200 queries/min
 10
 1 + 10 = 11
 550 queries/min
 20
 1 + 20 = 21
 1,050 queries/min
 Each getDistractors() runs:
 SELECT text FROM terms 
 JOIN translations ON ... 
 WHERE pos = $1 AND difficulty = $2 AND term_id != $3 AND text != $4 
 ORDER BY RANDOM() LIMIT 6
 Fix: Batch Fetch Distractors
 // Fetch all distractors in ONE query
 const allDistractors = await db
  .select({ termId: terms.id, text: translations.text })
  .from(terms)
  .innerJoin(translations, /* ... */)
  .where(and(
    eq(terms.pos, pos),
    eq(translations.difficulty, difficulty),
    inArray(terms.id, termIds), // Batch!
  ))
  .limit(DISTRACTOR_FETCH_COUNT * termIds.length);
 // Group by termId in JS, then slice to 3 unique distractors per term
 const distractorsByTerm = groupByTermId(allDistractors);
 Priority: 🔴 CRITICAL — Performance/scalability issue
 3. Error Handling Inconsistency
 Location: gameService.ts:33-36
 if (uniqueDistractors.length < 3) {
  throw new Error(`Not enough unique distractors for term: ${term.targetText}`); // 🚩
 }
 Problem: Raw Error bypasses your errorHandler middleware:
    No HTTP status mapping (defaults to 500)
    No structured logging
    Inconsistent API responses
 Fix:
 import { UnprocessableEntityError } from "../errors/AppError.js";
 if (uniqueDistractors.length < 3) {
  logger.warn({ termId: term.termId, uniqueCount: uniqueDistractors.length }, 
              "insufficient_distractors");
  throw new UnprocessableEntityError(
    `Not enough unique distractors for term: ${term.targetText}`
  );
 }
 Priority: 🟡 HIGH — Observability & UX issue
 ⚠️ High-Severity Smells
 4. Code Duplication: Singleplayer vs Multiplayer
 Compare: gameService.ts vs multiplayerGameService.ts
 // gameService.ts
 const optionTexts = [term.targetText, ...uniqueDistractors.slice(0, 3)];
 const shuffledTexts = shuffleArray(optionTexts);
 const correctOptionId = shuffledTexts.indexOf(term.targetText);
 // multiplayerGameService.ts (lines 35-45)
 const optionTexts = [correctAnswer.targetText, ...distractorTexts];
 const shuffledTexts = shuffle(optionTexts); // Different function, same logic!
 const correctOptionId = shuffledTexts.indexOf(correctAnswer.targetText);
 Risks:
    Fix shuffle bias in one place, forget the other
    Add new option type (e.g., etymology hint), update one service only
    Harder to test core game logic in isolation
 Fix: Extract pure function to @lila/shared or new @lila/game-logic:
 // packages/shared/src/game-logic.ts
 export const buildQuestionOptions = (
  correctAnswer: string,
  distractors: string[],
  optionCount: number = 4
 ): { options: AnswerOption[]; correctOptionId: number } => {
  const uniqueDistractors = [...new Set(distractors.filter(d => d !== correctAnswer))];
  const optionTexts = [correctAnswer, ...uniqueDistractors.slice(0, optionCount - 1)];
  const shuffled = shuffleSecure(optionTexts);
  const correctOptionId = shuffled.indexOf(correctAnswer);
  return {
    options: shuffled.map((text, idx) => ({ optionId: idx, text })),
    correctOptionId
  };
 };
 Priority: 🟡 HIGH — Maintainability issue
 5. Shuffle Bias: Math.random() Trap
 Location: utils.ts:shuffleArray() + multiplayerGameService.ts:shuffle()
 export const shuffleArray = <T>(array: T[]): T[] => {
  for (let i = result.length - 1; i > 0; i--) {
    const j = Math.floor(Math.random() * (i + 1)); // 🚩 Modulo bias + non-crypto RNG
    // ...
  }
 };
 The Math:
    Math.random() has ~53 bits of entropy (fine for vocab)
    Math.floor(rand * n) has modulo bias when n isn't a power of 2
    For n=4: bias is ~0.01% (tiny, but non-zero)
 When It Matters:
    Competitive leaderboards ("option 0 is correct 26% of the time")
    Achievement systems based on answer patterns
    Security-sensitive features (not applicable here, but principle matters)
 Fix (if needed):
 import { randomBytes } from "crypto";
 const shuffleSecure = <T>(array: T[]): T[] => {
  const result = [...array];
  for (let i = result.length - 1; i > 0; i--) {
    // Use crypto.getRandomValues for better randomness
    const rand = randomBytes(4).readUInt32LE(0);
    const j = rand % (i + 1);
    [result[i], result[j]] = [result[j], result[i]];
  }
  return result;
 };
 Priority: 🟢 LOW — Document tradeoff and move on for now
 6. Test Coverage Gaps
 File: gameService.test.ts
 ✅ Well Tested:
    Happy path: session creation, answer evaluation
    Edge cases: duplicate distractors, empty results, invalid inputs
    Error propagation from DB layer
 ❌ Missing Tests:
 // 1. Concurrency test (race condition)
 it("rejects duplicate answers for same question under concurrent load", async () => {
  const session = await createGameSession(validRequest, store, "user-1");
  const question = session.questions[0]!;
  // Submit two answers simultaneously
  const [result1, result2] = await Promise.allSettled([
    evaluateAnswer({ sessionId, questionId, selectedOptionId: 0 }, store, "user-1"),
    evaluateAnswer({ sessionId, questionId, selectedOptionId: 1 }, store, "user-1"),
  ]);
  // Exactly one should succeed, one should throw ConflictError
  expect([result1, result2].filter(r => r.status === "fulfilled")).toHaveLength(1);
 });
 // 2. TTL expiration test
 it("deletes session after TTL expires", async () => {
  vi.useFakeTimers();
  const session = await createGameSession(validRequest, store, "user-1");
  vi.advanceTimersByTime(31 * 60 * 1000); // 31 minutes
  await expect(store.get(session.sessionId)).resolves.toBeNull();
 });
 // 3. Distractor fallback strategy test
 it("uses fallback when <3 unique distractors available", async () => {
  mockGetDistractors.mockResolvedValue(["same", "same", "same", "same"]);
  // Should either: (a) fetch from broader pool, or (b) reduce rounds gracefully
 });
 Priority: 🟡 HIGH — Prevents regression on critical fixes
 🧼 Code Quality Nitpicks
 7. Magic Numbers
 // gameService.ts:52
 await store.create(sessionId, {...}, 30 * 60 * 1000); // What is this?
 // termModel.ts:65
 .limit(count); // count=6, but why?
 // shared/schemas/game.ts:15
 optionId: z.number().int().min(0).max(3), // Why 4 options?
 Fix: Centralize in @lila/shared/constants.ts:
 export const GAME_SESSION_TTL_MS = 30 * 60 * 1000;
 export const DISTRACTOR_FETCH_COUNT = 6;
 export const GAME_OPTION_COUNT = 4;
 export const MIN_UNIQUE_DISTRACTORS = 3;
 8. Mutable Reference Leakage
 Location: InMemoryGameSessionStore.ts:get()
 get(sessionId: string): Promise<GameSessionData | null> {
  return Promise.resolve(entry.data); // 🚩 Returns mutable reference to internal state
 }
 Risk: Any code that does session.answers.delete(...) mutates the store's internal Map directly.
 Fix:
 // Option A: Deep clone (simple, works for this data shape)
 return Promise.resolve(structuredClone(entry.data));
 // Option B: Return readonly view (TypeScript-only protection)
 return Promise.resolve(entry.data as Readonly<GameSessionData>);
 // Option C: Use immutable data structures (overkill for now)
 9. Zero Observability
 Problem: No logging, no metrics. You're flying blind in production.
 Minimal Fix (5 minutes):
 // apps/api/src/lib/logger.ts
 import pino from "pino";
 export const logger = pino({ 
  level: process.env.LOG_LEVEL || "info",
  transport: process.env.NODE_ENV === "production" 
    ? { target: "pino-pretty" } 
    : undefined 
 });
 // In gameService.ts:
 import { logger } from "../lib/logger.js";
 logger.info(
  { userId, sourceLang, targetLang, termCount: terms.length },
  "game_session_created"
 );
 logger.debug(
  { sessionId, questionId, isCorrect, responseTimeMs },
  "answer_evaluated"
 );
 Bonus: Export a Prometheus histogram for game_service_duration_seconds.
 10. ORDER BY RANDOM() Time Bomb
 Location: termModel.ts:getGameTerms() + getDistractors()
 .orderBy(sql`RANDOM()`) // 🚩 Fine for 10k rows, slow for 1M
 The Comment Admits It:
 // TODO(post-mvp): ORDER BY RANDOM() sorts the entire filtered result set...
 Reality Check: "Post-MVP" never comes without a ticket.
 Fix Options:
 -- Option A: Pre-computed random_seed column (updated nightly)
 WHERE ... AND random_seed >= random() 
 ORDER BY random_seed 
 LIMIT $1
 -- Option B: TABLESAMPLE for approximate sampling (Postgres 9.5+)
 FROM terms TABLESAMPLE SYSTEM(10) 
 WHERE ... 
 LIMIT $1
 -- Option C: Random offset (simple, but still scans)
 OFFSET floor(random() * (SELECT count(*) FROM terms WHERE ...))
 Action: Add a ticket to documentation/tickets/t00009.md now.