feat: update pipeline orchestrator for Kaikki — wire up stages 1 and 2
- Replace checkOmwExists with checkExtractedFilesExist - Wire up importKaikki and reverseLink as real stage implementations - Track reverse link completion via sentinel row in run_status - Update report to use resolved_entry_cefr and entry counts - Stages 3 onwards remain as stubs
This commit is contained in:
parent
6f9a42c707
commit
1c44ef989b
2 changed files with 92 additions and 41 deletions
|
|
@ -314,9 +314,12 @@ These are not part of the current pipeline but are worth considering as the data
|
|||
|
||||
## Roadmap
|
||||
|
||||
**Current state:** Production schema migrated to Kaikki flat model. Stage 1 extraction scripts written and sample run complete (500 entries per language). pipeline.db initialised and imported with sample data. Stage 2 reverse link sync not yet written. llama.cpp not installed.
|
||||
**Current state:** Stage 1 extraction and stage 2 reverse link sync scripts
|
||||
written and verified on sample data. pipeline.db contains 4,156 entries and
|
||||
4,287 translations across 5 languages. Stage 3 enrich scripts not yet written.
|
||||
llama.cpp not installed.
|
||||
|
||||
**Next action:** Write the stage 2 reverse link sync script.
|
||||
**Next action:** Write the stage 3 enrich script.
|
||||
|
||||
| Stage | Status |
|
||||
| --------------- | -------------- |
|
||||
|
|
@ -339,11 +342,11 @@ These are not part of the current pipeline but are worth considering as the data
|
|||
- [ ] Remove sample limit and run full extraction
|
||||
- [ ] Re-run full import → `pipeline.db`
|
||||
|
||||
### Stage 2 — Reverse link sync `🔲 not started`
|
||||
### Stage 2 — Reverse link sync `🔄 in progress`
|
||||
|
||||
- [ ] Write reverse link sync script
|
||||
- [ ] Write tests
|
||||
- [ ] Run reverse link sync → `pipeline.db`
|
||||
- [x] Write reverse link sync script
|
||||
- [x] Run reverse link sync on sample data → 141 links inserted
|
||||
- [ ] Run reverse link sync on full data after full extraction
|
||||
|
||||
### Stage 3 — Enrich `🔲 not started`
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue