- Add stage-1-extract/scripts/extract.ts — streams Kaikki JSONL, filters to supported POS and languages, skips abbreviations and senses with no translations in supported languages - Rewrite db/import.ts for Kaikki flat model — tracks sense_index offsets per headword+pos to handle duplicate JSONL entries - Rewrite db/schema.sql for Kaikki model — entries, translations, LLM vote tables, resolved tables - Add extract and db:import scripts to package.json - Sample mode hardcoded to 500 entries for development
21 lines
402 B
Text
21 lines
402 B
Text
node_modules/
|
|
dist/
|
|
build/
|
|
.env
|
|
**/*.tsbuildinfo
|
|
.repomixignore
|
|
repomix.config.json
|
|
repomix/
|
|
venv/
|
|
__pycache__/
|
|
*.pyc
|
|
|
|
data-pipeline/archive/
|
|
data-pipeline/stage-1-extract/output/
|
|
data-pipeline/stage-1-extract/sources/
|
|
data-pipeline/stage-2-annotate/output/
|
|
data-pipeline/stage-3-enrich/output/
|
|
data-pipeline/stage-4-merge/output/
|
|
data-pipeline/db/pipeline.db
|
|
data-pipeline/reports/
|
|
data-pipeline/.env
|