lila data pipeline

One paragraph: what this is, why it exists, where it feeds into.

Overview

Flow diagram: OMW + CEFR sources → Extract → Annotate → Enrich (LLM) → Merge → JSON → TS seeder → DB

(table: language, filename, approx. coverage — with a note pointing to COVERAGE.md for detail)

Each: what it does, input, output, how to run.

Table: language code, name, CEFR source file, full detail → COVERAGE.md

Step by step.

POS values, CEFR levels, difficulty mapping, language codes.