lila/.gitignore
lila 209d52f54b feat: add Kaikki extraction and import scripts for stage 1
- Add stage-1-extract/scripts/extract.ts — streams Kaikki JSONL,
  filters to supported POS and languages, skips abbreviations and
  senses with no translations in supported languages
- Rewrite db/import.ts for Kaikki flat model — tracks sense_index
  offsets per headword+pos to handle duplicate JSONL entries
- Rewrite db/schema.sql for Kaikki model — entries, translations,
  LLM vote tables, resolved tables
- Add extract and db:import scripts to package.json
- Sample mode hardcoded to 500 entries for development
2026-05-05 18:11:53 +02:00

21 lines
402 B
Text

node_modules/
dist/
build/
.env
**/*.tsbuildinfo
.repomixignore
repomix.config.json
repomix/
venv/
__pycache__/
*.pyc
data-pipeline/archive/
data-pipeline/stage-1-extract/output/
data-pipeline/stage-1-extract/sources/
data-pipeline/stage-2-annotate/output/
data-pipeline/stage-3-enrich/output/
data-pipeline/stage-4-merge/output/
data-pipeline/db/pipeline.db
data-pipeline/reports/
data-pipeline/.env