updating documentation

This commit is contained in:
lila 2026-04-06 17:01:34 +02:00
parent 570dbff25e
commit 60cf48ef97
3 changed files with 243 additions and 31 deletions

View file

@ -325,6 +325,25 @@ Exercise types split naturally into Type A (translation, current model) and Type
---
### Term glosses: Italian coverage is sparse (expected)
OMW gloss data is primarily in English. After full import:
- English glosses: 95,882 (~100% of terms)
- Italian glosses: 1,964 (~2% of terms)
This is not a data pipeline problem — it reflects the actual state of OMW. Italian
glosses simply don't exist for most synsets in the dataset.
**Handling in the UI:** fall back to the English gloss when no gloss exists for the
user's language. This is acceptable UX — a definition in the wrong language is better
than no definition at all.
If Italian gloss coverage needs to improve in the future, Wiktionary is the most
likely source — it has broader multilingual definition coverage than OMW.
---
## Open Research
### Semantic category metadata source