lila/documentation/ai-context/04-websocket-protocol.md
2026-05-16 01:59:43 +02:00

237 lines
8.6 KiB
Markdown

# 04 — WebSocket Protocol
> **Purpose:** Deep dive into WebSocket lifecycle, state management, and edge cases for LLMs working on multiplayer features. Concatenate with 00-project-overview.md and 99-current-task.md.
> **Last updated:** 2026-05-15
> **Depends on:** 00-project-overview.md, 03-api-contract.md
---
## Connection Lifecycle
### 1. Upgrade
```
Client: GET wss://api.lilastudy.com/ws
Headers: Cookie: better-auth.session=...
Server: Validates session via Better Auth (reads cookie, looks up in DB)
→ Valid: 101 Switching Protocols, connection established
→ Invalid: 401 Unauthorized, connection rejected
```
**Auth is mandatory.** No anonymous WebSocket connections. Guest play (if implemented) would need a different auth strategy here.
### 2. Message Routing
After connection, all messages flow through:
```
Raw JSON message
Zod safeParse against WebSocketMessageSchema (discriminated union on `type`)
Router switches on `type` → dispatches to handler
Handler executes business logic → broadcasts to room
```
**Invalid messages:** Parse failures are logged and silently dropped. The client receives no error response — this is intentional to prevent error spam from malformed clients.
### 3. Disconnect
When a client disconnects (browser close, network loss, page navigate):
```
Connection close event
Handler removes player from lobby (if in one)
Broadcasts updated lobby:state to remaining players
If game in progress and player disconnects:
→ Player is marked as "disconnected" (not removed from game state)
→ Their answer slot is treated as "no answer" (timeout)
→ Game continues
```
**No automatic reconnect.** The client must manually reconnect and re-join the lobby. Graceful reconnect with state restoration is planned (BACKLOG.md `next`).
---
## State Management
### Two-Tier Storage
| State Type | Storage | Durability | Use Case |
| ---------------- | ---------------------------------------------------------------- | ---------- | ------------------------------------------------- |
| Lobby membership | PostgreSQL (`lobbies`, `lobby_players`) | Durable | Who is in which room, who is host |
| Game state | In-memory (`InMemoryLobbyGameStore`, `InMemoryGameSessionStore`) | Ephemeral | Current question, scores, timer, answers received |
**Why the split?** Lobby membership must survive server restarts (players shouldn't be kicked on deploy). Game state is ephemeral by design — a game lasts minutes, and losing state on restart is acceptable for MVP.
### In-Memory Store Structure
```typescript
// Conceptual — actual implementation in apps/api/src/gameSessionStore/
interface InMemoryGameState {
[lobbyCode: string]: {
status: "waiting" | "question" | "result" | "finished";
currentRound: number;
totalRounds: number;
currentQuestion: GameQuestion | null;
answers: Map<playerId, { optionId: number; timestamp: number }>;
scores: Map<playerId, number>;
timer: NodeJS.Timeout | null; // 15s server timer
questionStartTime: number; // For speed-based tiebreaking
};
}
```
---
## The 15-Second Timer
### Implementation
```
Host sends lobby:start
Server generates questions, stores in game state
Broadcast game:question to all players
START 15-second timer (NodeJS setTimeout)
Player answers collected in Map<playerId, answer>
Timer expires OR all players answered
STOP timer, evaluate answers, broadcast game:answer_result
If more rounds: wait 3s → broadcast next game:question → restart timer
If last round: broadcast game:finished
```
### Timer Edge Cases
| Scenario | Behavior |
| -------------------------- | ------------------------------------------------------------------------- |
| Player answers at 14.9s | Valid, collected before timer expiry |
| Player answers at 15.1s | Rejected, treated as timeout. Timer already fired. |
| All players answer early | Timer is cleared early, round proceeds immediately |
| No one answers | All players get 0 points for that round, next round starts |
| Host disconnects mid-game | Game continues, any player can see results. No "host transfer" logic yet. |
| Non-host sends lobby:start | Silently ignored (or rejected — check implementation) |
---
## Message Broadcasting
### Room-Based Broadcasting
The server maintains a mapping of `lobbyCode → Set<WebSocket connections>`. When a message needs to broadcast:
```typescript
// Pseudo-code from ws/connections.ts
function broadcastToRoom(code: string, message: WebSocketMessage) {
const connections = roomConnections.get(code);
for (const ws of connections) {
if (ws.readyState === WebSocket.OPEN) {
ws.send(JSON.stringify(message));
}
}
}
```
**Self-broadcast:** The sender receives their own broadcast. The frontend must handle this (e.g., ignore their own lobby:state if they already updated optimistically).
### Message Ordering
WebSocket guarantees in-order delivery per connection. However, race conditions can occur:
- Player A sends `game:answer` at 14.5s
- Player B's connection lags, receives `game:answer_result` before their own `game:answer` ack
- **Frontend must handle out-of-order messages gracefully**
---
## Edge Cases & Failure Modes
### Mid-Game Disconnect
```
Player disconnects during question phase
Connection close handler triggered
Player NOT removed from game state (they might reconnect)
Timer continues
On timer expiry: player has no answer → treated as wrong
Result broadcast includes "disconnected" status for that player
```
**Current gap:** No reconnect-with-state-restoration. Player must re-join lobby and game state is not recovered. Planned in BACKLOG.md `next`.
### Double Join
```
Player joins lobby ABC
Player joins lobby ABC again (accidental double-click, retry)
Server: idempotent — player already in lobby, return 200
No duplicate entries in lobby_players table
```
### Rapid Start/Stop
```
Host clicks "Start" twice rapidly
First click: game starts, state changes to "in_progress"
Second click: server checks state, sees "in_progress", ignores
```
### Client-Side Message Loss
If a client's `game:answer` never reaches the server (network blip):
- Server never receives the answer
- Timer expires
- Player gets 0 points for that round
- **No retry mechanism** — client sends once, no ack expected
---
## Planned Improvements (Not Yet Implemented)
From BACKLOG.md `next`:
1. **Graceful WS reconnect** — Exponential back-off, restore game state on reconnection if game still in progress
2. **Heartbeat/ping** — Detect stale connections faster than TCP timeout
3. **Valkey for game state** — Replace in-memory store with Redis-compatible storage for horizontal scaling and persistence across restarts
4. **Configurable game settings** — Host sets round count, timer duration, target score via lobby settings jsonb column
5. **Additional game modes** — TV Quiz Show, Race to the Top, Chain Link, Elimination Round, Cooperative Challenge (see design/GAME_MODES.md)
---
## Key Files
| File | Purpose |
| ------------------------------------------------- | --------------------------------------------- |
| `apps/api/src/ws/index.ts` | WebSocket server setup, attach to HTTP server |
| `apps/api/src/ws/auth.ts` | Session validation on upgrade |
| `apps/api/src/ws/router.ts` | Message routing by `type` |
| `apps/api/src/ws/connections.ts` | Connection management, room mapping |
| `apps/api/src/ws/handlers/lobbyHandlers.ts` | lobby:join, lobby:leave, lobby:start |
| `apps/api/src/ws/handlers/gameHandlers.ts` | game:answer |
| `apps/api/src/services/multiplayerGameService.ts` | Game logic, timer, scoring |
| `apps/api/src/lobbyGameStore/` | In-memory lobby state storage |
| `packages/shared/src/schemas/lobby.ts` | WS message Zod schemas |
| `packages/shared/src/schemas/game.ts` | Game state Zod schemas |