Rethink organization of datafiles and wordlists #23

Open
opened 2026-04-19 07:23:33 +00:00 by forgejo-lila · 0 comments
Owner

Context

Data files are scattered across multiple directories with overlapping content:

  • data-sources/ — raw source files (CSV, XLS, JSON)
  • scripts/datafiles/ — processed/merged JSON files
  • scripts/data-sources/ — more raw source files (duplicates some of data-sources/)
  • packages/db/src/data/ — final JSON files consumed by the seeding script

What to do

Consolidate into a clear, non-overlapping structure. Suggested approach:

  1. One directory for raw, unprocessed source files (input to Python scripts)
  2. One directory for processed/merged output (input to TypeScript seeding)
  3. Remove duplicates

Acceptance criteria

  • No duplicate data files across directories
  • Clear separation between raw sources and processed output
  • Python extraction scripts and TypeScript seeding scripts reference correct paths
  • packages/db/src/seeding-datafiles.ts still works after reorganization
  • Document the new structure in a README or in documentation/decisions.md
## Context Data files are scattered across multiple directories with overlapping content: - `data-sources/` — raw source files (CSV, XLS, JSON) - `scripts/datafiles/` — processed/merged JSON files - `scripts/data-sources/` — more raw source files (duplicates some of data-sources/) - `packages/db/src/data/` — final JSON files consumed by the seeding script ## What to do Consolidate into a clear, non-overlapping structure. Suggested approach: 1. One directory for raw, unprocessed source files (input to Python scripts) 2. One directory for processed/merged output (input to TypeScript seeding) 3. Remove duplicates ## Acceptance criteria - No duplicate data files across directories - Clear separation between raw sources and processed output - Python extraction scripts and TypeScript seeding scripts reference correct paths - `packages/db/src/seeding-datafiles.ts` still works after reorganization - Document the new structure in a README or in `documentation/decisions.md`
forgejo-lila added the
debt
label 2026-04-19 07:23:33 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: forgejo-lila/lila#23
No description provided.