feat(scripts): add Italian CEFR data pipeline

- Add extractors for Italian sources: it_m3.xls and italian.json - Add comparison script (compare-italian.py) to report source overlaps and conflicts - Add merge script (merge-italian-json.py) with priority order ['italian', 'it_m3'] - Output authoritative dataset to datafiles/italian-merged.json - Update README to document both English and Italian pipelines
2026-04-08 18:32:03 +02:00 · 2026-04-08 18:32:03 +02:00 · 3374bd8b20
commit 3374bd8b20
parent 59152950d6
9 changed files with 208535 additions and 26 deletions
--- a/scripts/README.md
+++ b/scripts/README.md
@ -1,11 +1,16 @@
 # CEFR Data Pipeline
-This directory contains the source data files and extraction/merge pipeline for generating CEFR-enriched datasets. The final output (`english-merged.json`) is consumed by the database seeding process in `packages/db`.
+This directory contains the source data files and extraction/merge pipeline for generating CEFR-enriched datasets. The final outputs (`english-merged.json`, `italian-merged.json`) are consumed by the database seeding process in `packages/db`.
 ## Overview
 The pipeline transforms raw vocabulary data from multiple sources into a standardized format, resolves conflicts between sources, and produces an authoritative CEFR dataset per language. This dataset is then used by the Glossa database package to update translation records.
 ## Supported Languages
 - ✅ English (`en`)
 - ✅ Italian (`it`)
 ## Pipeline Stages
 ### Stage 1: Extraction
@ -22,12 +27,16 @@ Each source file is processed by a dedicated extractor script. The extractor rea
 - CEFR levels are validated against A1-C2
 - Each record includes the source identifier for traceability
-**Location:** `extraction-scripts/english/`  
+**Extractor Scripts:**
-**Scripts:**
+
- `extract-cefrj-csv.py`
+| Language | Source                 | Script                                                  |
- `extract-en_m3.py`
+|----------|------------------------|---------------------------------------------------------|
- `extract-octanove.py`
+| English  | `cefrj.csv`            | `extraction-scripts/english/extract-cefrj-csv.py`       |
- `extract-random-json.py`
+| English  | `en_m3.xls`            | `extraction-scripts/english/extract-en_m3.py`           |
 | English  | `octanove.csv`         | `extraction-scripts/english/extract-octanove.py`        |
 | English  | `random.json`          | `extraction-scripts/english/extract-random-json.py`     |
 | Italian  | `it_m3.xls`            | `extraction-scripts/italian/extract-it_m3.py`           |
 | Italian  | `italian.json`         | `extraction-scripts/italian/extract-italian-json.py`    |
 ### Stage 2: Comparison
@ -39,17 +48,18 @@ Before merging, sources are compared to identify agreements and conflicts. This
 - Overlap between sources (words appearing in multiple sources)
 - Agreement rate (sources assigning the same CEFR level)
 - Conflicts (same word/POS with different CEFR levels)
 - Database coverage (how many extracted words exist in the database)
-**Location:** `comparison-scripts/compare-english.py`  
+**Comparison Scripts:**
-**Usage:**
+
 | Language | Script                                        |
 |----------|-----------------------------------------------|
 | English  | `comparison-scripts/compare-english.py`       |
 | Italian  | `comparison-scripts/compare-italian.py`       |
 Run from the `scripts/` directory:
 ```bash
 cd scripts/
    python comparison-scripts/compare-english.py
-```
+    python comparison-scripts/compare-italian.py
 Conflicts are resolved in the next stage using source priority rules.
 ### Stage 3: Merge
@ -71,13 +81,17 @@ Difficulty is not extracted from sources. It is derived from the final CEFR leve
 The merged file includes both CEFR level and derived difficulty, plus a list of sources that contributed to each entry.
-**Location**: merge-scripts/merge-english-json.py
+**Merge Scripts & Priorities:**
-**Usage:**
+
 | Language | Script                                    | Priority (lowest → highest)                  |
 |----------|-------------------------------------------|----------------------------------------------|
 | English  | `merge-scripts/merge-english-json.py`     | `random`, `octanove`, `cefrj`, `en_m3`       |
 | Italian  | `merge-scripts/merge-italian-json.py`     | `italian`, `it_m3`                           |
 Run from the `scripts/` directory:
 ```bash
 cd scripts/
    python merge-scripts/merge-english-json.py
-```
+    python merge-scripts/merge-italian-json.py
 ### Stage 4: Enrichment
@ -88,9 +102,11 @@ The authoritative merged file is consumed by the database package (packages/db)
 ```
 scripts/
 ├── comparison-scripts/
-│   └── compare-english.py          # Stage 2: compare extracted data
+│ ├── compare-english.py
 │ └── compare-italian.py        # Stage 2: compare extracted data
 ├── datafiles/
-│   ├── english-merged.json         # Stage 3 output (authoritative dataset)
+│   ├── english-merged.json # Stage 3 output (authoritative)
 │   ├── italian-merged.json # Stage 3 output (authoritative)
 │   ├── omw-noun.json
 │   └── omw-verb.json
 ├── data-sources/
@ -105,7 +121,11 @@ scripts/
 │   │   └── random-extracted.json
 │   ├── french/                     # (future)
 │   ├── german/                     # (future)
-│   ├── italian/                    # (future)
+│   ├── italian/
 │   │   ├── it_m3.xls
 │   │   ├── it_m3-extracted.json
 │   │   ├── italian.json
 │   │   └── italian-extracted.json
 │   └── spanish/                    # (future)
 ├── extraction-scripts/
 │   └── english/
@ -113,6 +133,9 @@ scripts/
 │       ├── extract-en_m3.py
 │       ├── extract-octanove.py
 │       └── extract-random-json.py
 │   └── italian/
 │       ├── extract-it_m3.py
 │       └── extract-italian-json.py
 ├── merge-scripts/
 │   └── merge-english-json.py       # Stage 3: merge into authority
 ├── extract-own-save-to-json.py # script to extract words from wordnet
--- a/scripts/comparison-scripts/compare-italian.py
+++ b/scripts/comparison-scripts/compare-italian.py
@ -0,0 +1,166 @@
 #!/usr/bin/env python3
 """
 CEFR Data Pipeline - Stage 2: Italian Comparison
 Compares extracted JSON files for Italian and reports agreements and conflicts.
 """
 import json
 from collections import defaultdict
 from pathlib import Path
 from typing import Dict, List, Tuple
 # Supported CEFR levels
 CEFR_LEVELS = {"A1", "A2", "B1", "B2", "C1", "C2"}
 def load_extracted_files(data_dir: Path) -> Dict[str, List[dict]]:
    """Load all *-extracted.json files from the Italian data directory."""
    sources = {}
    for file_path in data_dir.glob("*-extracted.json"):
        source_name = file_path.stem.replace("-extracted", "")
        with open(file_path, "r", encoding="utf-8") as f:
            data = json.load(f)
            if isinstance(data, list):
                sources[source_name] = data
            else:
                print(f"Warning: {file_path} does not contain a list, skipping.")
    return sources
 def normalize_entry(entry: dict) -> Tuple[str, str]:
    """Return (word, pos) key for comparison."""
    return entry["word"].lower().strip(), entry["pos"].lower().strip()
 def compute_statistics(sources: Dict[str, List[dict]]) -> dict:
    """Compute overlap, agreement, and conflict statistics."""
    # Per-source counts by CEFR level
    source_counts = {}
    for src, entries in sources.items():
        cefr_counts = defaultdict(int)
        for e in entries:
            cefr = e.get("cefr", "UNKNOWN")
            cefr_counts[cefr] += 1
        source_counts[src] = dict(cefr_counts)
    # Build word->pos->sources and CEFR assignments
    word_map = defaultdict(lambda: defaultdict(dict))
    for src, entries in sources.items():
        for e in entries:
            key = normalize_entry(e)
            word_map[key][src] = e["cefr"]
    # Compute overlaps, agreements, conflicts
    total_entries = sum(len(e) for e in sources.values())
    unique_words = len(word_map)
    overlap_stats = defaultdict(int)
    agreement_count = 0
    conflict_count = 0
    conflict_details = []
    for key, src_cefr_map in word_map.items():
        num_sources = len(src_cefr_map)
        overlap_stats[num_sources] += 1
        if num_sources > 1:
            cefr_values = set(src_cefr_map.values())
            if len(cefr_values) == 1:
                agreement_count += 1
            else:
                conflict_count += 1
                conflict_details.append(
                    {"word": key[0], "pos": key[1], "assignments": dict(src_cefr_map)}
                )
    return {
        "source_counts": source_counts,
        "total_entries": total_entries,
        "unique_words": unique_words,
        "overlap_distribution": dict(overlap_stats),
        "agreements": agreement_count,
        "conflicts": conflict_count,
        "conflict_details": conflict_details,
    }
 def print_report(stats: dict, sources: Dict[str, List[dict]]):
    """Print formatted comparison report."""
    print(f"\n{'=' * 60}")
    print("CEFR COMPARISON REPORT - ITALIAN")
    print(f"{'=' * 60}")
    # Source entry counts
    print("\n📊 ENTRIES PER SOURCE AND CEFR LEVEL")
    print("-" * 50)
    for src, counts in stats["source_counts"].items():
        total = sum(counts.values())
        print(f"\n{src}: {total} total entries")
        for level in CEFR_LEVELS:
            cnt = counts.get(level, 0)
            if cnt > 0:
                print(f"  {level}: {cnt}")
        # Show non-standard levels
        for level, cnt in counts.items():
            if level not in CEFR_LEVELS and level != "UNKNOWN":
                print(f"  {level}: {cnt} (non-standard)")
    # Overlap statistics
    print("\n🔄 OVERLAP BETWEEN SOURCES")
    print("-" * 50)
    print(f"Total unique (word, POS) combinations: {stats['unique_words']}")
    print(f"Total entries across all sources: {stats['total_entries']}")
    overlap = stats["overlap_distribution"]
    for n_sources in sorted(overlap.keys()):
        count = overlap[n_sources]
        pct = (count / stats["unique_words"]) * 100
        print(f"Words appearing in {n_sources} source(s): {count} ({pct:.1f}%)")
    # Agreement and conflicts
    print("\n⚖️ AGREEMENT / CONFLICT SUMMARY")
    print("-" * 50)
    print(f"Words with >1 source: {stats['agreements'] + stats['conflicts']}")
    print(f"  ✅ Agreements (same CEFR): {stats['agreements']}")
    print(f"  ❌ Conflicts (different CEFR): {stats['conflicts']}")
    if stats["conflicts"] > 0:
        agreement_rate = (
            stats["agreements"] / (stats["agreements"] + stats["conflicts"])
        ) * 100
        print(f"  Agreement rate: {agreement_rate:.1f}%")
        print("\n📋 CONFLICT DETAILS (first 10 shown):")
        for i, conflict in enumerate(stats["conflict_details"][:10]):
            print(f"  {i + 1}. {conflict['word']} ({conflict['pos']})")
            for src, cefr in conflict["assignments"].items():
                print(f"       {src}: {cefr}")
        if len(stats["conflict_details"]) > 10:
            print(f"  ... and {len(stats['conflict_details']) - 10} more conflicts.")
    print(f"\n{'=' * 60}\n")
 def main():
    # Determine paths
    script_dir = Path(__file__).parent
    data_dir = script_dir.parent / "data-sources" / "italian"
    if not data_dir.exists():
        print(f"Error: Italian data directory not found: {data_dir}")
        return
    print(f"Loading extracted files from {data_dir}...")
    sources = load_extracted_files(data_dir)
    if not sources:
        print("No extracted files found.")
        return
    print(f"Found sources: {', '.join(sources.keys())}")
    stats = compute_statistics(sources)
    print_report(stats, sources)
 if __name__ == "__main__":
    main()
--- a/scripts/data-sources/italian/it_m3-extracted.json
+++ b/scripts/data-sources/italian/it_m3-extracted.json
--- a/scripts/data-sources/italian/italian-extracted.json
+++ b/scripts/data-sources/italian/italian-extracted.json
--- a/scripts/datafiles/italian-merged.json
+++ b/scripts/datafiles/italian-merged.json
--- a/scripts/extraction-scripts/english/extract-en_m3.py
+++ b/scripts/extraction-scripts/english/extract-en_m3.py
@ -91,12 +91,12 @@ def extract() -> None:
    print(f"Extracted: {len(records)} records")
    print(f"  - Nouns: {noun_count}")
    print(f"  - Verbs: {verb_count}")
-    print(f"\nCEFR distribution:")
+    print("\nCEFR distribution:")
    for level in CEFR_LEVELS:
        if level in cefr_distribution:
            print(f"  - {level}: {cefr_distribution[level]}")
-    print(f"\nSkipped:")
+    print("\nSkipped:")
    print(f"  - Unsupported POS: {skipped_pos}")
    print(f"  - Invalid CEFR: {skipped_invalid_cefr}")
    print(f"  - Empty word: {skipped_empty_word}")
--- a/scripts/extraction-scripts/italian/extract-it_m3.py
+++ b/scripts/extraction-scripts/italian/extract-it_m3.py
@ -0,0 +1,114 @@
 #!/usr/bin/env python3
 """
 scripts/extraction-scripts/italian/extract-it_m3.py
 Extracts CEFR data from it_m3.xls (Italian M3 wordlist).
 """
 import json
 from pathlib import Path
 import xlrd
 # Constants matching @glossa/shared
 SUPPORTED_POS = ["noun", "verb"]
 CEFR_LEVELS = ["A1", "A2", "B1", "B2", "C1", "C2"]
 # POS mapping (case-insensitive) – based on observed abbreviations
 POS_MAP = {
    "n": "noun",  # nome
    "v": "verb",  # verbo
 }
 # Column indices (0-based) – verified from sample
 WORD_COL = 0  # Lemma
 POS_COL = 1  # Pos
 CEFR_COL = 2  # Points (CEFR level)
 # Paths (relative to project root)
 INPUT_FILE = Path("scripts/data-sources/italian/it_m3.xls")
 OUTPUT_FILE = Path("scripts/data-sources/italian/it_m3-extracted.json")
 def extract() -> None:
    print(f"Reading: {INPUT_FILE}")
    records = []
    skipped_pos = 0
    skipped_invalid_cefr = 0
    skipped_empty_word = 0
    total_rows = 0
    wb = xlrd.open_workbook(INPUT_FILE)
    ws = wb.sheet_by_index(0)
    # Skip header row, start from row 1
    for row_idx in range(1, ws.nrows):
        total_rows += 1
        word_raw = ws.cell_value(row_idx, WORD_COL)
        pos_raw = ws.cell_value(row_idx, POS_COL)
        cefr_raw = ws.cell_value(row_idx, CEFR_COL)
        # Normalize POS (case-insensitive)
        pos = str(pos_raw).lower().strip() if pos_raw else ""
        if pos not in POS_MAP:
            skipped_pos += 1
            continue
        pos = POS_MAP[pos]
        # Normalize CEFR - handle smart quotes
        cefr_str = str(cefr_raw).strip() if cefr_raw else ""
        cefr_str = cefr_str.strip("\u201c\u201d")  # strip Unicode smart quotes
        cefr = cefr_str.upper()
        if cefr not in CEFR_LEVELS:
            skipped_invalid_cefr += 1
            continue
        # Normalize word – handle multiple forms like "il, lo, la" → take first?
        word_raw_str = str(word_raw).strip() if word_raw else ""
        # If word contains comma, take first part (e.g., "il, lo, la" → "il")
        # But this may lose variants; consider keeping as is or processing differently.
        # For consistency, we'll keep the full string and lowercase it.
        word = word_raw_str.lower()
        if not word:
            skipped_empty_word += 1
            continue
        record = {"word": word, "pos": pos, "cefr": cefr, "source": "it_m3"}
        records.append(record)
    # Write output
    with open(OUTPUT_FILE, "w", encoding="utf-8") as f:
        json.dump(records, f, indent=2, ensure_ascii=False)
    # Stats
    noun_count = sum(1 for r in records if r["pos"] == "noun")
    verb_count = sum(1 for r in records if r["pos"] == "verb")
    cefr_distribution = {}
    for level in CEFR_LEVELS:
        count = sum(1 for r in records if r["cefr"] == level)
        if count > 0:
            cefr_distribution[level] = count
    print(f"\nTotal rows in XLS: {total_rows}")
    print(f"Extracted: {len(records)} records")
    print(f"  - Nouns: {noun_count}")
    print(f"  - Verbs: {verb_count}")
    print(f"\nCEFR distribution:")
    for level in CEFR_LEVELS:
        if level in cefr_distribution:
            print(f"  - {level}: {cefr_distribution[level]}")
    print(f"\nSkipped:")
    print(f"  - Unsupported POS: {skipped_pos}")
    print(f"  - Invalid CEFR: {skipped_invalid_cefr}")
    print(f"  - Empty word: {skipped_empty_word}")
    print(f"\nOutput: {OUTPUT_FILE}")
 if __name__ == "__main__":
    extract()
--- a/scripts/extraction-scripts/italian/extract-random-json.py
+++ b/scripts/extraction-scripts/italian/extract-random-json.py
@ -0,0 +1,91 @@
 #!/usr/bin/env python3
 """
 scripts/extraction-scripts/italian/extract-italian-json.py
 Extracts CEFR data from italian.json (Italian flashcard source).
 Filters for useful_for_flashcard=true and supported POS (noun, verb).
 """
 import json
 from pathlib import Path
 # Constants matching @glossa/shared
 SUPPORTED_POS = ["noun", "verb"]
 CEFR_LEVELS = ["A1", "A2", "B1", "B2", "C1", "C2"]
 # Paths (relative to project root)
 INPUT_FILE = Path("scripts/data-sources/italian/italian.json")
 OUTPUT_FILE = Path("scripts/data-sources/italian/italian-extracted.json")
 def extract() -> None:
    print(f"Reading: {INPUT_FILE}")
    with open(INPUT_FILE, "r", encoding="utf-8") as f:
        data = json.load(f)
    records = []
    skipped_pos = 0
    skipped_not_useful = 0
    skipped_invalid_cefr = 0
    skipped_empty_word = 0
    for entry in data:
        # Filter: must be useful for flashcard
        if not entry.get("useful_for_flashcard", False):
            skipped_not_useful += 1
            continue
        # Filter: must have supported POS
        pos = entry.get("pos", "").lower().strip()
        if pos not in SUPPORTED_POS:
            skipped_pos += 1
            continue
        # Filter: must have valid CEFR level
        cefr = entry.get("cefr_level", "").upper().strip()
        if cefr not in CEFR_LEVELS:
            skipped_invalid_cefr += 1
            continue
        # Normalize word
        word = entry.get("word", "").lower().strip()
        if not word:
            skipped_empty_word += 1
            continue
        record = {"word": word, "pos": pos, "cefr": cefr, "source": "italian"}
        records.append(record)
    # Write output
    with open(OUTPUT_FILE, "w", encoding="utf-8") as f:
        json.dump(records, f, indent=2, ensure_ascii=False)
    # Stats
    noun_count = sum(1 for r in records if r["pos"] == "noun")
    verb_count = sum(1 for r in records if r["pos"] == "verb")
    cefr_distribution = {}
    for level in CEFR_LEVELS:
        count = sum(1 for r in records if r["cefr"] == level)
        if count > 0:
            cefr_distribution[level] = count
    print(f"\nExtracted: {len(records)} records")
    print(f"  - Nouns: {noun_count}")
    print(f"  - Verbs: {verb_count}")
    print("\nCEFR distribution:")
    for level in CEFR_LEVELS:
        if level in cefr_distribution:
            print(f"  - {level}: {cefr_distribution[level]}")
    print("\nSkipped:")
    print(f"  - Not useful for flashcard: {skipped_not_useful}")
    print(f"  - Unsupported POS: {skipped_pos}")
    print(f"  - Invalid CEFR: {skipped_invalid_cefr}")
    print(f"  - Empty word: {skipped_empty_word}")
    print(f"\nOutput: {OUTPUT_FILE}")
 if __name__ == "__main__":
    extract()
--- a/scripts/merge-scripts/merge-italian-json.py
+++ b/scripts/merge-scripts/merge-italian-json.py
@ -0,0 +1,159 @@
 #!/usr/bin/env python3
 """
 CEFR Data Pipeline - Stage 3: Italian Merge
 Merges extracted JSON files for Italian into an authoritative dataset.
 """
 import json
 from collections import defaultdict
 from pathlib import Path
 from typing import Dict, List, Tuple
 # Supported CEFR levels and difficulty mapping
 CEFR_LEVELS = {"A1", "A2", "B1", "B2", "C1", "C2"}
 DIFFICULTY_MAP = {
    "A1": "easy",
    "A2": "easy",
    "B1": "intermediate",
    "B2": "intermediate",
    "C1": "hard",
    "C2": "hard",
 }
 # Source priority order (from lowest to highest priority)
 # Higher index = higher authority when conflicts occur
 PRIORITY_ORDER = ["italian", "it_m3"]
 def load_extracted_files(data_dir: Path) -> Dict[str, List[dict]]:
    """Load all *-extracted.json files from the Italian data directory."""
    sources = {}
    for file_path in data_dir.glob("*-extracted.json"):
        source_name = file_path.stem.replace("-extracted", "")
        with open(file_path, "r", encoding="utf-8") as f:
            data = json.load(f)
            if isinstance(data, list):
                sources[source_name] = data
            else:
                print(f"Warning: {file_path} does not contain a list, skipping.")
    return sources
 def normalize_entry(entry: dict) -> Tuple[str, str]:
    """Return (word, pos) key for merging."""
    return entry["word"].lower().strip(), entry["pos"].lower().strip()
 def get_source_priority(source_name: str) -> int:
    """Return priority index for a source (higher = more authoritative)."""
    try:
        return PRIORITY_ORDER.index(source_name)
    except ValueError:
        # If source not in list, assign lowest priority
        return -1
 def merge_entries(sources: Dict[str, List[dict]]) -> List[dict]:
    """Merge entries from multiple sources, resolving conflicts by priority."""
    grouped = defaultdict(list)
    for src_name, entries in sources.items():
        for entry in entries:
            key = normalize_entry(entry)
            grouped[key].append((src_name, entry["cefr"], entry))
    merged = []
    conflicts_resolved = 0
    total_multi_source = 0
    for (word, pos), src_entries in grouped.items():
        if len(src_entries) == 1:
            src_name, cefr, original = src_entries[0]
            final_cefr = cefr
            contributing_sources = [src_name]
        else:
            total_multi_source += 1
            sorted_entries = sorted(
                src_entries, key=lambda x: get_source_priority(x[0]), reverse=True
            )
            highest_src, highest_cefr, _ = sorted_entries[0]
            all_cefrs = {e[1] for e in src_entries}
            if len(all_cefrs) > 1:
                conflicts_resolved += 1
            final_cefr = highest_cefr
            contributing_sources = [e[0] for e in src_entries]
        difficulty = DIFFICULTY_MAP.get(final_cefr, "unknown")
        merged.append(
            {
                "word": word,
                "pos": pos,
                "cefr": final_cefr,
                "difficulty": difficulty,
                "sources": sorted(contributing_sources),
            }
        )
    print(f"Merge statistics:")
    print(f"  Total unique entries: {len(merged)}")
    print(f"  Entries with multiple sources: {total_multi_source}")
    print(f"  Conflicts resolved by priority: {conflicts_resolved}")
    return merged
 def print_summary(merged: List[dict]):
    """Print distribution of CEFR levels and difficulty in final dataset."""
    cefr_counts = defaultdict(int)
    diff_counts = defaultdict(int)
    for entry in merged:
        cefr_counts[entry["cefr"]] += 1
        diff_counts[entry["difficulty"]] += 1
    print("\n📊 Final CEFR distribution:")
    for level in sorted(CEFR_LEVELS):
        count = cefr_counts.get(level, 0)
        if count:
            print(f"  {level}: {count}")
    print("\n📊 Final difficulty distribution:")
    for diff in ["easy", "intermediate", "hard"]:
        count = diff_counts.get(diff, 0)
        print(f"  {diff}: {count}")
 def main():
    script_dir = Path(__file__).parent
    data_dir = script_dir.parent / "data-sources" / "italian"
    output_dir = script_dir.parent / "datafiles"
    output_file = output_dir / "italian-merged.json"
    if not data_dir.exists():
        print(f"Error: Italian data directory not found: {data_dir}")
        return
    output_dir.mkdir(parents=True, exist_ok=True)
    print(f"Loading extracted files from {data_dir}...")
    sources = load_extracted_files(data_dir)
    if not sources:
        print("No extracted files found.")
        return
    print(f"Found sources: {', '.join(sources.keys())}")
    print(f"Priority order (lowest to highest): {PRIORITY_ORDER}")
    merged = merge_entries(sources)
    with open(output_file, "w", encoding="utf-8") as f:
        json.dump(merged, f, indent=2, ensure_ascii=False)
    print(f"\n✅ Merged dataset written to: {output_file}")
    print_summary(merged)
 if __name__ == "__main__":
    main()