Commit graph

4 commits

Author SHA1 Message Date
lila
76af2ab093 fix: update db import validation tests to account for reverse links
- Translation count test now adds reverse link count to expected total
- Non-English translations test now filters to kaikki source only
- Target language test now filters to kaikki source only — reverse links
  to English are valid and expected
2026-05-05 19:10:19 +02:00
lila
ba2635e3f7 feat: add stage 1 and db import validation tests for Kaikki schema 2026-05-05 18:51:11 +02:00
lila
209d52f54b feat: add Kaikki extraction and import scripts for stage 1
- Add stage-1-extract/scripts/extract.ts — streams Kaikki JSONL,
  filters to supported POS and languages, skips abbreviations and
  senses with no translations in supported languages
- Rewrite db/import.ts for Kaikki flat model — tracks sense_index
  offsets per headword+pos to handle duplicate JSONL entries
- Rewrite db/schema.sql for Kaikki model — entries, translations,
  LLM vote tables, resolved tables
- Add extract and db:import scripts to package.json
- Sample mode hardcoded to 500 entries for development
2026-05-05 18:11:53 +02:00
lila
4d42fe4397 removing db from git tracking, adding it to gitignore, add db import validation tests 2026-05-03 22:16:43 +02:00