lila/data-pipeline/db
lila 209d52f54b feat: add Kaikki extraction and import scripts for stage 1
- Add stage-1-extract/scripts/extract.ts — streams Kaikki JSONL,
  filters to supported POS and languages, skips abbreviations and
  senses with no translations in supported languages
- Rewrite db/import.ts for Kaikki flat model — tracks sense_index
  offsets per headword+pos to handle duplicate JSONL entries
- Rewrite db/schema.sql for Kaikki model — entries, translations,
  LLM vote tables, resolved tables
- Add extract and db:import scripts to package.json
- Sample mode hardcoded to 500 entries for development
2026-05-05 18:11:53 +02:00
..
import.ts feat: add Kaikki extraction and import scripts for stage 1 2026-05-05 18:11:53 +02:00
index.ts feat: add db schema, init, and vitest config 2026-05-03 17:56:29 +02:00
init.ts feat: add pipeline orchestrator skeleton with startup checks, stage runners, shutdown handler, and report generation 2026-05-03 23:01:29 +02:00
pipeline.db-shm removing db from git tracking, adding it to gitignore, add db import validation tests 2026-05-03 22:16:43 +02:00
pipeline.db-wal removing db from git tracking, adding it to gitignore, add db import validation tests 2026-05-03 22:16:43 +02:00
schema.sql feat: add Kaikki extraction and import scripts for stage 1 2026-05-05 18:11:53 +02:00