221 lines
11 KiB
Markdown
221 lines
11 KiB
Markdown
# 02 — Data Model
|
||
|
||
> **Purpose:** Database schema reference for LLMs working on features that query or modify data. Concatenate with 00-project-overview.md and 99-current-task.md.
|
||
> **Last updated:** 2026-05-15
|
||
> **Depends on:** 00-project-overview.md
|
||
|
||
---
|
||
|
||
## Core Tables
|
||
|
||
### `terms` — Language-neutral concepts
|
||
|
||
| Column | Type | Constraints | Notes |
|
||
| ------------ | --------- | -------------------------------------------- | ------------------------------------------------------ |
|
||
| `id` | uuid | PK | |
|
||
| `pos` | varchar | CHECK: `noun`, `verb`, `adjective`, `adverb` | Part of speech |
|
||
| `source` | varchar | | Pipeline that created this term (e.g. `kaikki`, `omw`) |
|
||
| `source_id` | varchar | UNIQUE(`source`, `source_id`) | Idempotency key for imports |
|
||
| `synset_id` | varchar | nullable | WordNet synset ID. Nullable for non-WordNet terms. |
|
||
| `created_at` | timestamp | default now() | |
|
||
|
||
**Rule:** One row per concept. The word "cat" (animal) and "cat" (nautical) are separate rows because they have different `source_id` values.
|
||
|
||
---
|
||
|
||
### `translations` — Per-language words
|
||
|
||
| Column | Type | Constraints | Notes |
|
||
| --------------- | ---------- | ----------------------------------- | ---------------------------------------- |
|
||
| `id` | uuid | PK | |
|
||
| `term_id` | uuid | FK → terms.id | |
|
||
| `language_code` | varchar(2) | CHECK: `en`, `it`, `de`, `es`, `fr` | |
|
||
| `text` | varchar | | The actual word |
|
||
| `cefr_level` | varchar(2) | nullable, CHECK: `A1`–`C2` | Difficulty of THIS word in THIS language |
|
||
| `created_at` | timestamp | default now() | |
|
||
|
||
**Unique constraint:** (`term_id`, `language_code`, `text`) — allows synonyms (e.g. "dog" and "hound" for same term), prevents exact duplicates.
|
||
|
||
**Key design:** `cefr_level` is on `translations`, not `terms`. "House" in English is A1; "domicile" is also English but B2 — same concept, different words, different difficulty.
|
||
|
||
---
|
||
|
||
### `term_glosses` — Definitions per language
|
||
|
||
| Column | Type | Constraints | Notes |
|
||
| --------------- | ---------- | ----------------------------------- | ---------------------- |
|
||
| `id` | uuid | PK | |
|
||
| `term_id` | uuid | FK → terms.id | |
|
||
| `language_code` | varchar(2) | CHECK: `en`, `it`, `de`, `es`, `fr` | |
|
||
| `text` | text | | Definition/explanation |
|
||
| `created_at` | timestamp | default now() | |
|
||
|
||
**Unique constraint:** (`term_id`, `language_code`) — one gloss per term per language. Prevents left joins from multiplying question rows.
|
||
|
||
**Note:** Italian gloss coverage is sparse (~2% of terms have Italian glosses). UI falls back to English gloss when no gloss exists for the user's language.
|
||
|
||
---
|
||
|
||
### `decks` — Curated wordlists
|
||
|
||
| Column | Type | Constraints | Notes |
|
||
| --------------------- | ------------ | ------------------------------------------------- | ------------------------------------------------------- |
|
||
| `id` | uuid | PK | |
|
||
| `name` | varchar | | e.g. `en-core-1000` |
|
||
| `source_language` | varchar(2) | CHECK | Language the wordlist was built from |
|
||
| `validated_languages` | varchar(2)[] | CHECK: source_language NOT IN validated_languages | Languages with complete translations for all deck terms |
|
||
| `description` | text | nullable | |
|
||
| `created_at` | timestamp | default now() | |
|
||
|
||
**Design:** One deck per frequency tier per source language. POS, difficulty, and category are query filters, not separate decks. Decks must not overlap — each term appears in exactly one tier.
|
||
|
||
**Source:** SUBTLEX frequency lists (per-language editions, same methodology).
|
||
|
||
---
|
||
|
||
### `deck_terms` — Junction table
|
||
|
||
| Column | Type | Constraints | Notes |
|
||
| ------------ | --------- | ------------- | ----- |
|
||
| `deck_id` | uuid | FK → decks.id | |
|
||
| `term_id` | uuid | FK → terms.id | |
|
||
| `created_at` | timestamp | default now() | |
|
||
|
||
**PK:** (`deck_id`, `term_id`)
|
||
|
||
---
|
||
|
||
## Auth Tables (managed by Better Auth)
|
||
|
||
Better Auth creates and owns these tables. Do not modify directly.
|
||
|
||
### `user`
|
||
|
||
| Column | Type | Notes |
|
||
| ---------------- | --------- | -------------------- |
|
||
| `id` | varchar | PK |
|
||
| `name` | varchar | Display name |
|
||
| `email` | varchar | |
|
||
| `email_verified` | boolean | |
|
||
| `image` | varchar | nullable, avatar URL |
|
||
| `created_at` | timestamp | |
|
||
| `updated_at` | timestamp | |
|
||
|
||
### `session`
|
||
|
||
| Column | Type | Notes |
|
||
| ------------ | --------- | ------------- |
|
||
| `id` | varchar | PK |
|
||
| `user_id` | varchar | FK → user.id |
|
||
| `token` | varchar | Session token |
|
||
| `expires_at` | timestamp | |
|
||
| `ip_address` | varchar | nullable |
|
||
| `user_agent` | text | nullable |
|
||
| `created_at` | timestamp | |
|
||
|
||
### `account` — Social provider links
|
||
|
||
| Column | Type | Notes |
|
||
| --------------- | --------- | -------------------- |
|
||
| `id` | varchar | PK |
|
||
| `user_id` | varchar | FK → user.id |
|
||
| `account_id` | varchar | Provider's user ID |
|
||
| `provider_id` | varchar | `google` or `github` |
|
||
| `access_token` | text | nullable |
|
||
| `refresh_token` | text | nullable |
|
||
| `id_token` | text | nullable |
|
||
| `expires_at` | timestamp | nullable |
|
||
|
||
**Note:** One user can have multiple accounts (Google + GitHub linked to same user).
|
||
|
||
### `verification`
|
||
|
||
Email verification tokens. Unused for social-only auth but managed by Better Auth.
|
||
|
||
---
|
||
|
||
## Lobby Tables (Multiplayer)
|
||
|
||
### `lobbies`
|
||
|
||
| Column | Type | Constraints | Notes |
|
||
| ------------- | --------- | ------------------------------------------- | -------------------------------------------- |
|
||
| `id` | uuid | PK | |
|
||
| `code` | varchar | UNIQUE | Human-readable room code (e.g. `WOLF-42`) |
|
||
| `host_id` | varchar | FK → user.id | |
|
||
| `status` | varchar | CHECK: `waiting`, `in_progress`, `finished` | |
|
||
| `max_players` | integer | default 4 | |
|
||
| `settings` | jsonb | nullable | Game mode, round count, timer duration, etc. |
|
||
| `created_at` | timestamp | default now() | |
|
||
| `updated_at` | timestamp | default now() | Used for stale recovery |
|
||
|
||
### `lobby_players`
|
||
|
||
| Column | Type | Constraints | Notes |
|
||
| -------------- | --------- | --------------- | ---------------------------- |
|
||
| `id` | uuid | PK | |
|
||
| `lobby_id` | uuid | FK → lobbies.id | |
|
||
| `user_id` | varchar | FK → user.id | |
|
||
| `display_name` | varchar | | Player's shown name in lobby |
|
||
| `is_host` | boolean | default false | |
|
||
| `joined_at` | timestamp | default now() | |
|
||
|
||
**Unique constraint:** (`lobby_id`, `user_id`) — one entry per player per lobby.
|
||
|
||
---
|
||
|
||
## Key Relationships
|
||
|
||
```
|
||
terms (1) ←──→ (N) translations
|
||
terms (1) ←──→ (N) term_glosses
|
||
terms (N) ←──→ (N) decks via deck_terms
|
||
user (1) ←──→ (N) sessions
|
||
user (1) ←──→ (N) accounts
|
||
user (1) ←──→ (N) lobbies (as host)
|
||
user (1) ←──→ (N) lobby_players
|
||
lobbies (1) ←──→ (N) lobby_players
|
||
```
|
||
|
||
---
|
||
|
||
## Query Patterns
|
||
|
||
### Get quiz terms (singleplayer)
|
||
|
||
```sql
|
||
SELECT t.id, t.pos, src.text AS source_text, tgt.text AS target_text, g.text AS gloss
|
||
FROM terms t
|
||
JOIN translations src ON src.term_id = t.id AND src.language_code = ?
|
||
JOIN translations tgt ON tgt.term_id = t.id AND tgt.language_code = ?
|
||
LEFT JOIN term_glosses g ON g.term_id = t.id AND g.language_code = ?
|
||
WHERE t.pos = ? AND tgt.cefr_level IN (?)
|
||
LIMIT ?
|
||
```
|
||
|
||
### Get distractors
|
||
|
||
```sql
|
||
SELECT text FROM translations
|
||
WHERE language_code = ? AND pos = ? AND cefr_level IN (?)
|
||
AND term_id != ? AND text != ?
|
||
ORDER BY RANDOM()
|
||
LIMIT 3
|
||
```
|
||
|
||
**Note:** This is the N+1 query mentioned in BACKLOG.md. Each question fetches 3 distractors separately. Batching is planned.
|
||
|
||
---
|
||
|
||
## Deferred Schema Extensions (Not Yet Implemented)
|
||
|
||
These tables are planned but do not exist yet. All are additive — they reference existing `terms` rows via FK.
|
||
|
||
| Table | Purpose | Trigger |
|
||
| --------------------- | ----------------------------------------------- | ----------------------- |
|
||
| `noun_forms` | Gender, singular, plural, articles per language | Grammar quiz mode |
|
||
| `verb_forms` | Conjugation tables per language | Grammar quiz mode |
|
||
| `term_pronunciations` | IPA + audio URLs per language | Pronunciation quiz mode |
|
||
| `user_decks` | Which decks a user studies | User customization |
|
||
| `user_term_progress` | Spaced repetition state per user/term/language | SRS review queue |
|
||
| `quiz_answers` | Answer history for stats/analytics | User stats dashboard |
|