lila/data-pipeline
lila 209d52f54b feat: add Kaikki extraction and import scripts for stage 1
- Add stage-1-extract/scripts/extract.ts — streams Kaikki JSONL,
  filters to supported POS and languages, skips abbreviations and
  senses with no translations in supported languages
- Rewrite db/import.ts for Kaikki flat model — tracks sense_index
  offsets per headword+pos to handle duplicate JSONL entries
- Rewrite db/schema.sql for Kaikki model — entries, translations,
  LLM vote tables, resolved tables
- Add extract and db:import scripts to package.json
- Sample mode hardcoded to 500 entries for development
2026-05-05 18:11:53 +02:00
..
db feat: add Kaikki extraction and import scripts for stage 1 2026-05-05 18:11:53 +02:00
sample feat: add db schema, init, and vitest config 2026-05-03 17:56:29 +02:00
stage-1-extract/scripts feat: add Kaikki extraction and import scripts for stage 1 2026-05-05 18:11:53 +02:00
stage-3-enrich feat: enrich stage foundation — provider config, env setup, schema fix 2026-05-03 22:44:14 +02:00
.env.example feat: enrich stage foundation — provider config, env setup, schema fix 2026-05-03 22:44:14 +02:00
audit.ts docs: rewrite data-pipeline.md for Kaikki migration 2026-05-05 17:14:48 +02:00
package.json feat: enrich stage foundation — provider config, env setup, schema fix 2026-05-03 22:44:14 +02:00
pipeline.ts feat: add pipeline orchestrator skeleton with startup checks, stage runners, shutdown handler, and report generation 2026-05-03 23:01:29 +02:00
tsconfig.json feat: add db schema, init, and vitest config 2026-05-03 17:56:29 +02:00
vitest.config.ts feat: add stage 1 and 2 validation tests 2026-05-03 21:36:56 +02:00