# 02 — Data Model > **Purpose:** Database schema reference for LLMs working on features that query or modify data. Concatenate with 00-project-overview.md and 99-current-task.md. > **Last updated:** 2026-05-15 > **Depends on:** 00-project-overview.md --- ## Core Tables ### `terms` — Language-neutral concepts | Column | Type | Constraints | Notes | | ------------ | --------- | -------------------------------------------- | ------------------------------------------------------ | | `id` | uuid | PK | | | `pos` | varchar | CHECK: `noun`, `verb`, `adjective`, `adverb` | Part of speech | | `source` | varchar | | Pipeline that created this term (e.g. `kaikki`, `omw`) | | `source_id` | varchar | UNIQUE(`source`, `source_id`) | Idempotency key for imports | | `synset_id` | varchar | nullable | WordNet synset ID. Nullable for non-WordNet terms. | | `created_at` | timestamp | default now() | | **Rule:** One row per concept. The word "cat" (animal) and "cat" (nautical) are separate rows because they have different `source_id` values. --- ### `translations` — Per-language words | Column | Type | Constraints | Notes | | --------------- | ---------- | ----------------------------------- | ---------------------------------------- | | `id` | uuid | PK | | | `term_id` | uuid | FK → terms.id | | | `language_code` | varchar(2) | CHECK: `en`, `it`, `de`, `es`, `fr` | | | `text` | varchar | | The actual word | | `cefr_level` | varchar(2) | nullable, CHECK: `A1`–`C2` | Difficulty of THIS word in THIS language | | `created_at` | timestamp | default now() | | **Unique constraint:** (`term_id`, `language_code`, `text`) — allows synonyms (e.g. "dog" and "hound" for same term), prevents exact duplicates. **Key design:** `cefr_level` is on `translations`, not `terms`. "House" in English is A1; "domicile" is also English but B2 — same concept, different words, different difficulty. --- ### `term_glosses` — Definitions per language | Column | Type | Constraints | Notes | | --------------- | ---------- | ----------------------------------- | ---------------------- | | `id` | uuid | PK | | | `term_id` | uuid | FK → terms.id | | | `language_code` | varchar(2) | CHECK: `en`, `it`, `de`, `es`, `fr` | | | `text` | text | | Definition/explanation | | `created_at` | timestamp | default now() | | **Unique constraint:** (`term_id`, `language_code`) — one gloss per term per language. Prevents left joins from multiplying question rows. **Note:** Italian gloss coverage is sparse (~2% of terms have Italian glosses). UI falls back to English gloss when no gloss exists for the user's language. --- ### `decks` — Curated wordlists | Column | Type | Constraints | Notes | | --------------------- | ------------ | ------------------------------------------------- | ------------------------------------------------------- | | `id` | uuid | PK | | | `name` | varchar | | e.g. `en-core-1000` | | `source_language` | varchar(2) | CHECK | Language the wordlist was built from | | `validated_languages` | varchar(2)[] | CHECK: source_language NOT IN validated_languages | Languages with complete translations for all deck terms | | `description` | text | nullable | | | `created_at` | timestamp | default now() | | **Design:** One deck per frequency tier per source language. POS, difficulty, and category are query filters, not separate decks. Decks must not overlap — each term appears in exactly one tier. **Source:** SUBTLEX frequency lists (per-language editions, same methodology). --- ### `deck_terms` — Junction table | Column | Type | Constraints | Notes | | ------------ | --------- | ------------- | ----- | | `deck_id` | uuid | FK → decks.id | | | `term_id` | uuid | FK → terms.id | | | `created_at` | timestamp | default now() | | **PK:** (`deck_id`, `term_id`) --- ## Auth Tables (managed by Better Auth) Better Auth creates and owns these tables. Do not modify directly. ### `user` | Column | Type | Notes | | ---------------- | --------- | -------------------- | | `id` | varchar | PK | | `name` | varchar | Display name | | `email` | varchar | | | `email_verified` | boolean | | | `image` | varchar | nullable, avatar URL | | `created_at` | timestamp | | | `updated_at` | timestamp | | ### `session` | Column | Type | Notes | | ------------ | --------- | ------------- | | `id` | varchar | PK | | `user_id` | varchar | FK → user.id | | `token` | varchar | Session token | | `expires_at` | timestamp | | | `ip_address` | varchar | nullable | | `user_agent` | text | nullable | | `created_at` | timestamp | | ### `account` — Social provider links | Column | Type | Notes | | --------------- | --------- | -------------------- | | `id` | varchar | PK | | `user_id` | varchar | FK → user.id | | `account_id` | varchar | Provider's user ID | | `provider_id` | varchar | `google` or `github` | | `access_token` | text | nullable | | `refresh_token` | text | nullable | | `id_token` | text | nullable | | `expires_at` | timestamp | nullable | **Note:** One user can have multiple accounts (Google + GitHub linked to same user). ### `verification` Email verification tokens. Unused for social-only auth but managed by Better Auth. --- ## Lobby Tables (Multiplayer) ### `lobbies` | Column | Type | Constraints | Notes | | ------------- | --------- | ------------------------------------------- | -------------------------------------------- | | `id` | uuid | PK | | | `code` | varchar | UNIQUE | Human-readable room code (e.g. `WOLF-42`) | | `host_id` | varchar | FK → user.id | | | `status` | varchar | CHECK: `waiting`, `in_progress`, `finished` | | | `max_players` | integer | default 4 | | | `settings` | jsonb | nullable | Game mode, round count, timer duration, etc. | | `created_at` | timestamp | default now() | | | `updated_at` | timestamp | default now() | Used for stale recovery | ### `lobby_players` | Column | Type | Constraints | Notes | | -------------- | --------- | --------------- | ---------------------------- | | `id` | uuid | PK | | | `lobby_id` | uuid | FK → lobbies.id | | | `user_id` | varchar | FK → user.id | | | `display_name` | varchar | | Player's shown name in lobby | | `is_host` | boolean | default false | | | `joined_at` | timestamp | default now() | | **Unique constraint:** (`lobby_id`, `user_id`) — one entry per player per lobby. --- ## Key Relationships ``` terms (1) ←──→ (N) translations terms (1) ←──→ (N) term_glosses terms (N) ←──→ (N) decks via deck_terms user (1) ←──→ (N) sessions user (1) ←──→ (N) accounts user (1) ←──→ (N) lobbies (as host) user (1) ←──→ (N) lobby_players lobbies (1) ←──→ (N) lobby_players ``` --- ## Query Patterns ### Get quiz terms (singleplayer) ```sql SELECT t.id, t.pos, src.text AS source_text, tgt.text AS target_text, g.text AS gloss FROM terms t JOIN translations src ON src.term_id = t.id AND src.language_code = ? JOIN translations tgt ON tgt.term_id = t.id AND tgt.language_code = ? LEFT JOIN term_glosses g ON g.term_id = t.id AND g.language_code = ? WHERE t.pos = ? AND tgt.cefr_level IN (?) LIMIT ? ``` ### Get distractors ```sql SELECT text FROM translations WHERE language_code = ? AND pos = ? AND cefr_level IN (?) AND term_id != ? AND text != ? ORDER BY RANDOM() LIMIT 3 ``` **Note:** This is the N+1 query mentioned in BACKLOG.md. Each question fetches 3 distractors separately. Batching is planned. --- ## Deferred Schema Extensions (Not Yet Implemented) These tables are planned but do not exist yet. All are additive — they reference existing `terms` rows via FK. | Table | Purpose | Trigger | | --------------------- | ----------------------------------------------- | ----------------------- | | `noun_forms` | Gender, singular, plural, articles per language | Grammar quiz mode | | `verb_forms` | Conjugation tables per language | Grammar quiz mode | | `term_pronunciations` | IPA + audio URLs per language | Pronunciation quiz mode | | `user_decks` | Which decks a user studies | User customization | | `user_term_progress` | Spaced repetition state per user/term/language | SRS review queue | | `quiz_answers` | Answer history for stats/analytics | User stats dashboard |