11 KiB
02 — Data Model
Purpose: Database schema reference for LLMs working on features that query or modify data. Concatenate with 00-project-overview.md and 99-current-task.md. Last updated: 2026-05-15 Depends on: 00-project-overview.md
Core Tables
terms — Language-neutral concepts
| Column | Type | Constraints | Notes |
|---|---|---|---|
id |
uuid | PK | |
pos |
varchar | CHECK: noun, verb, adjective, adverb |
Part of speech |
source |
varchar | Pipeline that created this term (e.g. kaikki, omw) |
|
source_id |
varchar | UNIQUE(source, source_id) |
Idempotency key for imports |
synset_id |
varchar | nullable | WordNet synset ID. Nullable for non-WordNet terms. |
created_at |
timestamp | default now() |
Rule: One row per concept. The word "cat" (animal) and "cat" (nautical) are separate rows because they have different source_id values.
translations — Per-language words
| Column | Type | Constraints | Notes |
|---|---|---|---|
id |
uuid | PK | |
term_id |
uuid | FK → terms.id | |
language_code |
varchar(2) | CHECK: en, it, de, es, fr |
|
text |
varchar | The actual word | |
cefr_level |
varchar(2) | nullable, CHECK: A1–C2 |
Difficulty of THIS word in THIS language |
created_at |
timestamp | default now() |
Unique constraint: (term_id, language_code, text) — allows synonyms (e.g. "dog" and "hound" for same term), prevents exact duplicates.
Key design: cefr_level is on translations, not terms. "House" in English is A1; "domicile" is also English but B2 — same concept, different words, different difficulty.
term_glosses — Definitions per language
| Column | Type | Constraints | Notes |
|---|---|---|---|
id |
uuid | PK | |
term_id |
uuid | FK → terms.id | |
language_code |
varchar(2) | CHECK: en, it, de, es, fr |
|
text |
text | Definition/explanation | |
created_at |
timestamp | default now() |
Unique constraint: (term_id, language_code) — one gloss per term per language. Prevents left joins from multiplying question rows.
Note: Italian gloss coverage is sparse (~2% of terms have Italian glosses). UI falls back to English gloss when no gloss exists for the user's language.
decks — Curated wordlists
| Column | Type | Constraints | Notes |
|---|---|---|---|
id |
uuid | PK | |
name |
varchar | e.g. en-core-1000 |
|
source_language |
varchar(2) | CHECK | Language the wordlist was built from |
validated_languages |
varchar(2)[] | CHECK: source_language NOT IN validated_languages | Languages with complete translations for all deck terms |
description |
text | nullable | |
created_at |
timestamp | default now() |
Design: One deck per frequency tier per source language. POS, difficulty, and category are query filters, not separate decks. Decks must not overlap — each term appears in exactly one tier.
Source: SUBTLEX frequency lists (per-language editions, same methodology).
deck_terms — Junction table
| Column | Type | Constraints | Notes |
|---|---|---|---|
deck_id |
uuid | FK → decks.id | |
term_id |
uuid | FK → terms.id | |
created_at |
timestamp | default now() |
PK: (deck_id, term_id)
Auth Tables (managed by Better Auth)
Better Auth creates and owns these tables. Do not modify directly.
user
| Column | Type | Notes |
|---|---|---|
id |
varchar | PK |
name |
varchar | Display name |
email |
varchar | |
email_verified |
boolean | |
image |
varchar | nullable, avatar URL |
created_at |
timestamp | |
updated_at |
timestamp |
session
| Column | Type | Notes |
|---|---|---|
id |
varchar | PK |
user_id |
varchar | FK → user.id |
token |
varchar | Session token |
expires_at |
timestamp | |
ip_address |
varchar | nullable |
user_agent |
text | nullable |
created_at |
timestamp |
account — Social provider links
| Column | Type | Notes |
|---|---|---|
id |
varchar | PK |
user_id |
varchar | FK → user.id |
account_id |
varchar | Provider's user ID |
provider_id |
varchar | google or github |
access_token |
text | nullable |
refresh_token |
text | nullable |
id_token |
text | nullable |
expires_at |
timestamp | nullable |
Note: One user can have multiple accounts (Google + GitHub linked to same user).
verification
Email verification tokens. Unused for social-only auth but managed by Better Auth.
Lobby Tables (Multiplayer)
lobbies
| Column | Type | Constraints | Notes |
|---|---|---|---|
id |
uuid | PK | |
code |
varchar | UNIQUE | Human-readable room code (e.g. WOLF-42) |
host_id |
varchar | FK → user.id | |
status |
varchar | CHECK: waiting, in_progress, finished |
|
max_players |
integer | default 4 | |
settings |
jsonb | nullable | Game mode, round count, timer duration, etc. |
created_at |
timestamp | default now() | |
updated_at |
timestamp | default now() | Used for stale recovery |
lobby_players
| Column | Type | Constraints | Notes |
|---|---|---|---|
id |
uuid | PK | |
lobby_id |
uuid | FK → lobbies.id | |
user_id |
varchar | FK → user.id | |
display_name |
varchar | Player's shown name in lobby | |
is_host |
boolean | default false | |
joined_at |
timestamp | default now() |
Unique constraint: (lobby_id, user_id) — one entry per player per lobby.
Key Relationships
terms (1) ←──→ (N) translations
terms (1) ←──→ (N) term_glosses
terms (N) ←──→ (N) decks via deck_terms
user (1) ←──→ (N) sessions
user (1) ←──→ (N) accounts
user (1) ←──→ (N) lobbies (as host)
user (1) ←──→ (N) lobby_players
lobbies (1) ←──→ (N) lobby_players
Query Patterns
Get quiz terms (singleplayer)
SELECT t.id, t.pos, src.text AS source_text, tgt.text AS target_text, g.text AS gloss
FROM terms t
JOIN translations src ON src.term_id = t.id AND src.language_code = ?
JOIN translations tgt ON tgt.term_id = t.id AND tgt.language_code = ?
LEFT JOIN term_glosses g ON g.term_id = t.id AND g.language_code = ?
WHERE t.pos = ? AND tgt.cefr_level IN (?)
LIMIT ?
Get distractors
SELECT text FROM translations
WHERE language_code = ? AND pos = ? AND cefr_level IN (?)
AND term_id != ? AND text != ?
ORDER BY RANDOM()
LIMIT 3
Note: This is the N+1 query mentioned in BACKLOG.md. Each question fetches 3 distractors separately. Batching is planned.
Deferred Schema Extensions (Not Yet Implemented)
These tables are planned but do not exist yet. All are additive — they reference existing terms rows via FK.
| Table | Purpose | Trigger |
|---|---|---|
noun_forms |
Gender, singular, plural, articles per language | Grammar quiz mode |
verb_forms |
Conjugation tables per language | Grammar quiz mode |
term_pronunciations |
IPA + audio URLs per language | Pronunciation quiz mode |
user_decks |
Which decks a user studies | User customization |
user_term_progress |
Spaced repetition state per user/term/language | SRS review queue |
quiz_answers |
Answer history for stats/analytics | User stats dashboard |