feat(scripts): add Italian CEFR data pipeline
- Add extractors for Italian sources: it_m3.xls and italian.json - Add comparison script (compare-italian.py) to report source overlaps and conflicts - Add merge script (merge-italian-json.py) with priority order ['italian', 'it_m3'] - Output authoritative dataset to datafiles/italian-merged.json - Update README to document both English and Italian pipelines
This commit is contained in:
parent
59152950d6
commit
3374bd8b20
9 changed files with 208535 additions and 26 deletions
22076
scripts/data-sources/italian/it_m3-extracted.json
Normal file
22076
scripts/data-sources/italian/it_m3-extracted.json
Normal file
File diff suppressed because it is too large
Load diff
72212
scripts/data-sources/italian/italian-extracted.json
Normal file
72212
scripts/data-sources/italian/italian-extracted.json
Normal file
File diff suppressed because it is too large
Load diff
Loading…
Add table
Add a link
Reference in a new issue