updating documentation

This commit is contained in:
lila 2026-05-16 01:59:43 +02:00
parent 1ba57c7e9d
commit 7e0311683f
25 changed files with 2660 additions and 226 deletions

View file

@ -0,0 +1,221 @@
# 02 — Data Model
> **Purpose:** Database schema reference for LLMs working on features that query or modify data. Concatenate with 00-project-overview.md and 99-current-task.md.
> **Last updated:** 2026-05-15
> **Depends on:** 00-project-overview.md
---
## Core Tables
### `terms` — Language-neutral concepts
| Column | Type | Constraints | Notes |
| ------------ | --------- | -------------------------------------------- | ------------------------------------------------------ |
| `id` | uuid | PK | |
| `pos` | varchar | CHECK: `noun`, `verb`, `adjective`, `adverb` | Part of speech |
| `source` | varchar | | Pipeline that created this term (e.g. `kaikki`, `omw`) |
| `source_id` | varchar | UNIQUE(`source`, `source_id`) | Idempotency key for imports |
| `synset_id` | varchar | nullable | WordNet synset ID. Nullable for non-WordNet terms. |
| `created_at` | timestamp | default now() | |
**Rule:** One row per concept. The word "cat" (animal) and "cat" (nautical) are separate rows because they have different `source_id` values.
---
### `translations` — Per-language words
| Column | Type | Constraints | Notes |
| --------------- | ---------- | ----------------------------------- | ---------------------------------------- |
| `id` | uuid | PK | |
| `term_id` | uuid | FK → terms.id | |
| `language_code` | varchar(2) | CHECK: `en`, `it`, `de`, `es`, `fr` | |
| `text` | varchar | | The actual word |
| `cefr_level` | varchar(2) | nullable, CHECK: `A1``C2` | Difficulty of THIS word in THIS language |
| `created_at` | timestamp | default now() | |
**Unique constraint:** (`term_id`, `language_code`, `text`) — allows synonyms (e.g. "dog" and "hound" for same term), prevents exact duplicates.
**Key design:** `cefr_level` is on `translations`, not `terms`. "House" in English is A1; "domicile" is also English but B2 — same concept, different words, different difficulty.
---
### `term_glosses` — Definitions per language
| Column | Type | Constraints | Notes |
| --------------- | ---------- | ----------------------------------- | ---------------------- |
| `id` | uuid | PK | |
| `term_id` | uuid | FK → terms.id | |
| `language_code` | varchar(2) | CHECK: `en`, `it`, `de`, `es`, `fr` | |
| `text` | text | | Definition/explanation |
| `created_at` | timestamp | default now() | |
**Unique constraint:** (`term_id`, `language_code`) — one gloss per term per language. Prevents left joins from multiplying question rows.
**Note:** Italian gloss coverage is sparse (~2% of terms have Italian glosses). UI falls back to English gloss when no gloss exists for the user's language.
---
### `decks` — Curated wordlists
| Column | Type | Constraints | Notes |
| --------------------- | ------------ | ------------------------------------------------- | ------------------------------------------------------- |
| `id` | uuid | PK | |
| `name` | varchar | | e.g. `en-core-1000` |
| `source_language` | varchar(2) | CHECK | Language the wordlist was built from |
| `validated_languages` | varchar(2)[] | CHECK: source_language NOT IN validated_languages | Languages with complete translations for all deck terms |
| `description` | text | nullable | |
| `created_at` | timestamp | default now() | |
**Design:** One deck per frequency tier per source language. POS, difficulty, and category are query filters, not separate decks. Decks must not overlap — each term appears in exactly one tier.
**Source:** SUBTLEX frequency lists (per-language editions, same methodology).
---
### `deck_terms` — Junction table
| Column | Type | Constraints | Notes |
| ------------ | --------- | ------------- | ----- |
| `deck_id` | uuid | FK → decks.id | |
| `term_id` | uuid | FK → terms.id | |
| `created_at` | timestamp | default now() | |
**PK:** (`deck_id`, `term_id`)
---
## Auth Tables (managed by Better Auth)
Better Auth creates and owns these tables. Do not modify directly.
### `user`
| Column | Type | Notes |
| ---------------- | --------- | -------------------- |
| `id` | varchar | PK |
| `name` | varchar | Display name |
| `email` | varchar | |
| `email_verified` | boolean | |
| `image` | varchar | nullable, avatar URL |
| `created_at` | timestamp | |
| `updated_at` | timestamp | |
### `session`
| Column | Type | Notes |
| ------------ | --------- | ------------- |
| `id` | varchar | PK |
| `user_id` | varchar | FK → user.id |
| `token` | varchar | Session token |
| `expires_at` | timestamp | |
| `ip_address` | varchar | nullable |
| `user_agent` | text | nullable |
| `created_at` | timestamp | |
### `account` — Social provider links
| Column | Type | Notes |
| --------------- | --------- | -------------------- |
| `id` | varchar | PK |
| `user_id` | varchar | FK → user.id |
| `account_id` | varchar | Provider's user ID |
| `provider_id` | varchar | `google` or `github` |
| `access_token` | text | nullable |
| `refresh_token` | text | nullable |
| `id_token` | text | nullable |
| `expires_at` | timestamp | nullable |
**Note:** One user can have multiple accounts (Google + GitHub linked to same user).
### `verification`
Email verification tokens. Unused for social-only auth but managed by Better Auth.
---
## Lobby Tables (Multiplayer)
### `lobbies`
| Column | Type | Constraints | Notes |
| ------------- | --------- | ------------------------------------------- | -------------------------------------------- |
| `id` | uuid | PK | |
| `code` | varchar | UNIQUE | Human-readable room code (e.g. `WOLF-42`) |
| `host_id` | varchar | FK → user.id | |
| `status` | varchar | CHECK: `waiting`, `in_progress`, `finished` | |
| `max_players` | integer | default 4 | |
| `settings` | jsonb | nullable | Game mode, round count, timer duration, etc. |
| `created_at` | timestamp | default now() | |
| `updated_at` | timestamp | default now() | Used for stale recovery |
### `lobby_players`
| Column | Type | Constraints | Notes |
| -------------- | --------- | --------------- | ---------------------------- |
| `id` | uuid | PK | |
| `lobby_id` | uuid | FK → lobbies.id | |
| `user_id` | varchar | FK → user.id | |
| `display_name` | varchar | | Player's shown name in lobby |
| `is_host` | boolean | default false | |
| `joined_at` | timestamp | default now() | |
**Unique constraint:** (`lobby_id`, `user_id`) — one entry per player per lobby.
---
## Key Relationships
```
terms (1) ←──→ (N) translations
terms (1) ←──→ (N) term_glosses
terms (N) ←──→ (N) decks via deck_terms
user (1) ←──→ (N) sessions
user (1) ←──→ (N) accounts
user (1) ←──→ (N) lobbies (as host)
user (1) ←──→ (N) lobby_players
lobbies (1) ←──→ (N) lobby_players
```
---
## Query Patterns
### Get quiz terms (singleplayer)
```sql
SELECT t.id, t.pos, src.text AS source_text, tgt.text AS target_text, g.text AS gloss
FROM terms t
JOIN translations src ON src.term_id = t.id AND src.language_code = ?
JOIN translations tgt ON tgt.term_id = t.id AND tgt.language_code = ?
LEFT JOIN term_glosses g ON g.term_id = t.id AND g.language_code = ?
WHERE t.pos = ? AND tgt.cefr_level IN (?)
LIMIT ?
```
### Get distractors
```sql
SELECT text FROM translations
WHERE language_code = ? AND pos = ? AND cefr_level IN (?)
AND term_id != ? AND text != ?
ORDER BY RANDOM()
LIMIT 3
```
**Note:** This is the N+1 query mentioned in BACKLOG.md. Each question fetches 3 distractors separately. Batching is planned.
---
## Deferred Schema Extensions (Not Yet Implemented)
These tables are planned but do not exist yet. All are additive — they reference existing `terms` rows via FK.
| Table | Purpose | Trigger |
| --------------------- | ----------------------------------------------- | ----------------------- |
| `noun_forms` | Gender, singular, plural, articles per language | Grammar quiz mode |
| `verb_forms` | Conjugation tables per language | Grammar quiz mode |
| `term_pronunciations` | IPA + audio URLs per language | Pronunciation quiz mode |
| `user_decks` | Which decks a user studies | User customization |
| `user_term_progress` | Spaced repetition state per user/term/language | SRS review queue |
| `quiz_answers` | Answer history for stats/analytics | User stats dashboard |