From c49c2fe2c30c6bb64b5b64a8e8b417539a5a6611 Mon Sep 17 00:00:00 2001
From: lila <beiweitemderbeste@protonmail.com>
Date: Sun, 5 Apr 2026 19:28:53 +0200
Subject: [PATCH] updating docs

---
 documentation/decisions.md | 52 ++++++++++----------------------------
 1 file changed, 14 insertions(+), 38 deletions(-)

diff --git a/documentation/decisions.md b/documentation/decisions.md
index 75e2d52..0fc1244 100644
--- a/documentation/decisions.md
+++ b/documentation/decisions.md
@@ -228,37 +228,6 @@ This is why `decks.source_language` is not just a technical detail — it is the
 
 Same translation data underneath, correctly frequency-grounded per direction. Two wordlist files, two generation script runs.
 
-### Decks: media metadata structure (post-MVP, options documented)
-
-When the Media hierarchy is implemented, each media type (TV show, movie, book, song)
-has different attributes. Three options considered:
-
-**Option A: One table with nullable columns**
-All media types in one table, type-specific columns nullable. Simple but becomes a sparse
-matrix as media types grow.
-
-**Option B: Separate table per media type**
-```ts
-tv_metadata:    deck_id, title, season, episode
-movie_metadata: deck_id, title, year
-book_metadata:  deck_id, title, author, year
-song_metadata:  deck_id, title, artist, album, year
-```
-Each table has exactly the right columns. Clean and queryable, more tables to maintain.
-
-**Option C: JSONB for flexible attributes**
-```ts
-media_metadata: deck_id, media_type, title, attributes jsonb
-```
-Type-specific fields in a JSON blob. No migration needed for new media types but
-attributes are not schema-validated and harder to query.
-
-**Current recommendation:** Option A to start (few media types initially, sparse
-columns manageable), migrate to Option B if the number of media types grows.
-Option C only if media types become numerous and unpredictable.
-
-Decision deferred until Media is actually built.
-
 ### Terms: `synset_id` nullable (not NOT NULL)
 
 **Problem:** non-WordNet terms (custom words, Wiktionary-sourced entries added later) won't have a synset ID. `NOT NULL` is too strict.
@@ -401,7 +370,7 @@ Too expensive at scale — only viable for small curated additions on top of an
 
 ## Current State
 
-Phase 0 complete. Phase 1 data pipeline complete.
+Phase 0 complete. Phase 1 data pipeline complete. Phase 2 data model finalized and migrated.
 
 ### Completed (Phase 1 — data pipeline)
 
@@ -417,19 +386,26 @@ Phase 0 complete. Phase 1 data pipeline complete.
   - creates deck if it doesn't exist, adds only missing terms on subsequent runs
   - recalculates and persists `validated_languages` on every run
 
-### Known data facts
+### Completed (Phase 2 — data model)
+
+- [x] `synset_id` removed, replaced by `source` + `source_id` on `terms`
+- [x] `cefr_level` added to `translations` (not `terms` — difficulty is language-relative)
+- [x] `language_code` CHECK constraint added to `translations` and `term_glosses`
+- [x] `language_pairs` table dropped — pairs derived from decks at query time
+- [x] `is_public` and `added_at` dropped from `decks` and `deck_terms`
+- [x] `type` added to `decks` with CHECK against `SUPPORTED_DECK_TYPES`
+- [x] `topics` and `term_topics` tables added (empty for MVP)
+- [x] Migration generated and run against fresh database
+
+### Known data facts (pre-wipe, for reference)
 
 - Wordlist: 999 unique words after deduplication (1000 lines, 1 duplicate)
 - Term IDs resolved: 3171 (higher than word count due to homonyms)
 - Words not found in DB: 34
 - Italian (`it`) coverage: 3171 / 3171 — full coverage, included in `validated_languages`
 
-### Next (Phase 2 — data model + pipeline)
+### Next (Phase 3 — data pipeline + API)
 
-Roadmap to API implementation:
-
-1. **Finalize data model** — apply decisions above: `synset_id` nullable, add `source` + `source_id` to `terms`, add `cefr_level` to `translations`, add `categories` + `term_categories` tables, add `language_code` CHECK to `translations` and `term_glosses`, drop `language_pairs`
-2. **Write and run migrations** — schema changes before any data expansion
 3. **Expand data pipeline** — import all OMW languages and POS, not just English nouns with Italian translations
 4. **Decide SUBTLEX → `cefr_level` mapping strategy** — raw frequency ranks need a mapping to A1–C2 bands before tiered decks are meaningful
 5. **Generate decks** — run generation script with SUBTLEX-grounded wordlists per source language