lila/data-pipeline
lila 0cc643e308 feat: update extractor for all 5 languages, update import for multi-language
- Extract.ts now processes all 5 language files, filters non-English
  entries by lang_code, skips translation extraction for non-English
  (no translations in source files)
- Import.ts now imports all 5 language output files, uses language
  field from ExtractedSense instead of hardcoding en
- Sample limit hardcoded to 500 entries per language for development
2026-05-05 18:46:32 +02:00
..
db feat: update extractor for all 5 languages, update import for multi-language 2026-05-05 18:46:32 +02:00
sample feat: add db schema, init, and vitest config 2026-05-03 17:56:29 +02:00
stage-1-extract/scripts feat: update extractor for all 5 languages, update import for multi-language 2026-05-05 18:46:32 +02:00
stage-3-enrich feat: enrich stage foundation — provider config, env setup, schema fix 2026-05-03 22:44:14 +02:00
.env.example feat: enrich stage foundation — provider config, env setup, schema fix 2026-05-03 22:44:14 +02:00
audit.ts docs: rewrite data-pipeline.md for Kaikki migration 2026-05-05 17:14:48 +02:00
package.json feat: update extractor for all 5 languages, update import for multi-language 2026-05-05 18:46:32 +02:00
pipeline.ts feat: add pipeline orchestrator skeleton with startup checks, stage runners, shutdown handler, and report generation 2026-05-03 23:01:29 +02:00
tsconfig.json feat: add db schema, init, and vitest config 2026-05-03 17:56:29 +02:00
vitest.config.ts feat: add stage 1 and 2 validation tests 2026-05-03 21:36:56 +02:00