feat: update extractor for all 5 languages, update import for multi-language

- Extract.ts now processes all 5 language files, filters non-English
  entries by lang_code, skips translation extraction for non-English
  (no translations in source files)
- Import.ts now imports all 5 language output files, uses language
  field from ExtractedSense instead of hardcoding en
- Sample limit hardcoded to 500 entries per language for development
This commit is contained in:
lila 2026-05-05 18:46:32 +02:00
parent 209d52f54b
commit 0cc643e308
3 changed files with 173 additions and 107 deletions

View file

@ -4,6 +4,7 @@
"private": true,
"type": "module",
"scripts": {
"extract": "tsx stage-1-extract/scripts/extract.ts",
"db:import": "tsx db/import.ts",
"db:init": "tsx db/init.ts",
"annotate": "tsx stage-2-annotate/scripts/annotate.ts",