Commit graph

12 commits

Author SHA1 Message Date
lila
04a581efe1 WIP: checkpoint before stage-3 sub-stage rewrite 2026-05-12 22:13:14 +02:00
lila
73fb12ac35 feat: enrich script working, redesigning to sub-stage architecture
- Enrich script functional with timeout, progress tracking, rejection mechanism
- Identified ordering issue: CEFR voting needs validated translations first
- Redesign: round1_gloss → round1_example → round1_translations → round1_cefr
- Update data-pipeline.md with new sub-stage design and roadmap
- Qwen3.5-4B confirmed working with thinking disabled
2026-05-07 13:09:43 +02:00
lila
7f10c35e03 docs: update roadmap — stage 3 enrich script written, llama.cpp next 2026-05-05 19:30:18 +02:00
lila
1c44ef989b feat: update pipeline orchestrator for Kaikki — wire up stages 1 and 2
- Replace checkOmwExists with checkExtractedFilesExist
- Wire up importKaikki and reverseLink as real stage implementations
- Track reverse link completion via sentinel row in run_status
- Update report to use resolved_entry_cefr and entry counts
- Stages 3 onwards remain as stubs
2026-05-05 19:04:28 +02:00
lila
b5a76ee178 docs: update roadmap — stage 1 in progress, sample extraction complete 2026-05-05 18:52:10 +02:00
lila
38d8b85228 docs: rewrite data-pipeline.md for Kaikki migration 2026-05-05 17:14:48 +02:00
lila
f59399be02 feat: add db import script, fix duplicate translations in extract, add annotate script 2026-05-03 22:05:10 +02:00
lila
4fa3073412 feat: add db schema, init, and vitest config 2026-05-03 17:56:29 +02:00
lila
74cfc82bdd docs: finalise data-pipeline.md with tiebreak, pipeline.db, reports, sync 2026-05-03 17:21:02 +02:00
lila
6007fe1e38 docs: update data-pipeline.md and llm-setup.md to reflect sqlite architecture 2026-05-02 20:13:05 +02:00
lila
4f59f3bc14 formatting 2026-04-28 13:18:18 +02:00
lila
9a3376cdcc updating docs 2026-04-21 15:40:26 +02:00
Renamed from documentation/PIPELINE.md (Browse further)