Commit graph

62 commits

Author SHA1 Message Date
lila
13cc709b09 adding script to check cefr coverage between json files and database, adding script to write cefr levels from json to db 2026-04-09 10:25:20 +02:00
lila
3374bd8b20 feat(scripts): add Italian CEFR data pipeline
- Add extractors for Italian sources: it_m3.xls and italian.json
- Add comparison script (compare-italian.py) to report source overlaps and conflicts
- Add merge script (merge-italian-json.py) with priority order ['italian', 'it_m3']
- Output authoritative dataset to datafiles/italian-merged.json
- Update README to document both English and Italian pipelines
2026-04-08 18:32:03 +02:00
lila
59152950d6 extraction, comparison and merging scripts for english are done, final english.json exists 2026-04-08 17:50:25 +02:00
lila
3596f76492 extraction datafiles with cefr annotations 2026-04-08 13:09:47 +02:00
lila
e79fa6922b updating schema 2026-04-07 01:03:22 +02:00
lila
0cb9fe1485 adding datafiles + updating documentation 2026-04-07 00:00:58 +02:00
lila
60cf48ef97 updating documentation 2026-04-06 17:01:34 +02:00
lila
570dbff25e updating seeding script 2026-04-06 17:01:17 +02:00
lila
aa1a332226 removing files 2026-04-06 17:01:04 +02:00
lila
6cb0068d1a adding datafiles for all english and italian nousn and verbs 2026-04-05 19:35:52 +02:00
lila
88691a345e extracted all english and italian nouns and verbs from own 2026-04-05 19:34:11 +02:00
lila
2a8630660e generating and migrating new schema 2026-04-05 19:30:05 +02:00
lila
e3c05b5596 updating seeding pipeline 2026-04-05 19:29:47 +02:00
lila
dfeb6a4cb0 updating seeding pipeline 2026-04-05 19:29:17 +02:00
lila
c49c2fe2c3 updating docs 2026-04-05 19:28:53 +02:00
lila
e80f291c41 refactoring data model 2026-04-05 18:57:09 +02:00
lila
b16b5db3f7 updating data models 2026-04-05 01:21:32 +02:00
lila
bfc09180f1 updating documentation 2026-04-05 01:21:18 +02:00
lila
7d80b20390 wip version of the api 2026-04-05 00:33:34 +02:00
lila
c24967dc74 updating docs 2026-04-05 00:33:05 +02:00
lila
1accb10f49 typo 2026-04-04 03:37:58 +02:00
lila
5180ecc864 installing zod + adding zod schemas 2026-04-02 20:02:26 +02:00
lila
874dd5e4c7 adding documentation and roadmap for the most minimal mvp 2026-04-02 18:28:44 +02:00
lila
a9cbcb719c refactoring schema + generate + migrate 2026-04-02 15:48:48 +02:00
lila
38a62ca3a4 refactoring 2026-04-02 15:48:31 +02:00
lila
cdedbc44cd refactoring 2026-04-02 13:37:54 +02:00
lila
b0c0baf9ab updating documentation 2026-04-01 18:02:12 +02:00
lila
3bb8bfdb39 feat(db): complete deck generation script for top english nouns
- add deck_terms to schema imports
- add addTermsToDeck — diffs source term IDs against existing deck_terms,
  inserts only new ones, returns count of inserted terms
- add updateValidatedLanguages — recalculates and persists validated_languages
  on every run so coverage stays accurate as translation data grows
- wire both functions into main with isNewDeck guard to avoid redundant
  validated_languages update on deck creation
- add final summary report
- fix possible undefined on result[0] in createDeck
- tick off remaining roadmap items
2026-04-01 17:56:31 +02:00
lila
7fdcedd1dd wip 2026-04-01 02:43:55 +02:00
lila
a49bce4a5a adding tasks 2026-04-01 01:22:21 +02:00
lila
4ef70b3876 updating decks to include source language 2026-04-01 01:03:41 +02:00
lila
5603f15fe3 adding bug description as todo comment 2026-03-31 18:34:23 +02:00
lila
488f0dab11 wip 2026-03-31 18:28:29 +02:00
lila
9d1a82bdf0 reviewing and updating deck generation 2026-03-31 16:48:40 +02:00
lila
521ffe3b6e adding migration script 2026-03-31 10:09:30 +02:00
lila
e3a2136720 formatting 2026-03-31 10:06:06 +02:00
lila
20fa6a9331 adding datafiles and seeding script 2026-03-31 10:05:36 +02:00
lila
068949b4cb adjusting path where the database file is saved, so the data persists after reboot 2026-03-31 10:04:50 +02:00
lila
2b177aad5b feat(db): add incremental upsert seed script for WordNet vocabulary
Implements packages/db/src/seed.ts — reads all JSON files from
scripts/datafiles/, validates filenames against supported language
codes and POS, and upserts synsets into  and
via onConflictDoNothing. Safe to re-run; produces 0 writes on
a duplicate run.
2026-03-30 15:58:01 +02:00
lila
55885336ba feat(db): add drizzle schema for vocabulary and deck tables
- terms, translations, term_glosses with cascade deletes and pos check constraint
- language_pairs with source/target language check constraints and no-self-pair guard
- users with openauth_sub as identity provider key
- decks and deck_terms with composite PK and position ordering
- indexes on all hot query paths (distractor generation, deck lookups, FK joins)
- SUPPORTED_POS and SUPPORTED_LANGUAGE_CODES as single source of truth in @glossa/shared
2026-03-28 19:02:10 +01:00
lila
be7a7903c5 refactor: migrate to deck-based vocabulary curation
Database Schema:
- Add decks table for curated word lists (A1, Most Common, etc.)
- Add deck_terms join table with position ordering
- Link rooms to decks via rooms.deck_id FK
- Remove frequency_rank from terms (now deck-scoped)
- Change users.id to uuid, add openauth_sub for auth mapping
- Add room_players.left_at for disconnect tracking
- Add rooms.updated_at for stale room recovery
- Add CHECK constraints for data integrity (pos, status, etc.)

Extraction Script:
- Rewrite extract.py to mirror complete OMW dataset
- Extract all 25,204 bilingual noun synsets (en-it)
- Remove frequency filtering and block lists
- Output all lemmas per synset for full synonym support
- Seed data now uncurated; decks handle selection

Architecture:
- Separate concerns: raw OMW data in DB, curation in decks
- Enables user-created decks and multiple difficulty levels
- Rooms select vocabulary by choosing a deck
2026-03-27 16:53:26 +01:00
lila
e9e750da3e setting up python env, download word data 2026-03-26 11:41:46 +01:00
lila
a4a14828e8 no isPrimary 2026-03-26 10:11:25 +01:00
lila
c1b90b9643 chore: complete phase 0 - update decisions.md and mark phase complete 2026-03-26 09:51:03 +01:00
lila
5561d54a24 feat(infra): add docker-compose and dockerfiles for all services 2026-03-26 09:43:39 +01:00
lila
2ebf0d0a83 infra: add Docker Compose setup for local development
- Configure PostgreSQL 18 and Valkey 9.1 services
- Create multi-stage Dockerfiles for API and Web apps
- Set up pnpm workspace support in container builds
- Configure hot reload via volume mounts for both services
- Add healthchecks for service orchestration
- Support dev/production stage targets (tsx watch vs compiled)
2026-03-25 18:56:04 +01:00
lila
671d542d2d chore(db): add drizzle migration pipeline with empty schema 2026-03-24 11:04:40 +01:00
lila
a8e247829c feat(db): configure drizzle orm and postgres connection 2026-03-24 10:59:03 +01:00
lila
3faa3d4ffb installing drizzle, confirm working db connection via test script 2026-03-23 09:10:48 +01:00
lila
681c6d2b4f installing and configuring tailwind 2026-03-21 20:59:26 +01:00