lila/scripts/data-sources
lila 3374bd8b20 feat(scripts): add Italian CEFR data pipeline
- Add extractors for Italian sources: it_m3.xls and italian.json
- Add comparison script (compare-italian.py) to report source overlaps and conflicts
- Add merge script (merge-italian-json.py) with priority order ['italian', 'it_m3']
- Output authoritative dataset to datafiles/italian-merged.json
- Update README to document both English and Italian pipelines
2026-04-08 18:32:03 +02:00
..
english extraction, comparison and merging scripts for english are done, final english.json exists 2026-04-08 17:50:25 +02:00
french extraction datafiles with cefr annotations 2026-04-08 13:09:47 +02:00
german extraction datafiles with cefr annotations 2026-04-08 13:09:47 +02:00
italian feat(scripts): add Italian CEFR data pipeline 2026-04-08 18:32:03 +02:00
spanish extraction datafiles with cefr annotations 2026-04-08 13:09:47 +02:00