lila/documentation/deployment.md

# Deployment Guide — lilastudy.com

This document describes the production deployment of the lila vocabulary trainer on a Hetzner VPS.

## Infrastructure Overview

- **VPS**: Hetzner, Debian 13, ARM64 (aarch64), 4GB RAM
- **Domain**: lilastudy.com (DNS managed on Hetzner, wildcard `*.lilastudy.com` configured)
- **Reverse proxy**: Caddy (Docker container, automatic HTTPS via Let's Encrypt)
- **Container registry**: Forgejo built-in package registry
- **Git server**: Forgejo

### Subdomain Routing

| Subdomain | Service | Container port |
|---|---|---|
| `lilastudy.com` | Frontend (nginx serving static files) | 80 |
| `api.lilastudy.com` | Express API | 3000 |
| `git.lilastudy.com` | Forgejo (web UI + container registry) | 3000 |

### Ports Exposed to the Internet

| Port | Service |
|---|---|
| 80 | Caddy (HTTP, redirects to HTTPS) |
| 443 | Caddy (HTTPS) |
| 2222 | Forgejo SSH (git clone/push) |

All other services (Postgres, API, frontend) communicate only over the internal Docker network.

## VPS Base Setup

The server has SSH key auth, ufw firewall (ports 22, 80, 443, 2222), and fail2ban configured. Docker and Docker Compose are installed via Docker's official apt repository.

Locale `en_GB.UTF-8` was generated alongside `en_US.UTF-8` to suppress SSH locale warnings from the dev laptop.

## Directory Structure on VPS

```
~/lila-app/
├── docker-compose.yml
├── Caddyfile
└── .env
~/lila-db-backups/
├── lila-db-YYYY-MM-DD_HHMM.sql.gz
└── backup.sh
```

## Docker Compose Stack

All services run in a single `docker-compose.yml` on a shared `lila-network`. The app images are pulled from the Forgejo registry.

### Services

- **caddy** — reverse proxy, only container with published ports (80, 443)
- **api** — Express backend, image from `git.lilastudy.com/forgejo-lila/lila-api:latest`
- **web** — nginx serving Vite-built static files, image from `git.lilastudy.com/forgejo-lila/lila-web:latest`
- **database** — PostgreSQL with a named volume (`lila-db`) for persistence
- **forgejo** — git server + container registry, SSH on port 2222, data in named volume (`forgejo-data`)

### Key Design Decisions

- No ports exposed on internal services — only Caddy faces the internet
- Frontend is built to static files at Docker build time; no Node process in production
- `VITE_API_URL` is baked in during the Docker build via a build arg
- The API reads all environment-specific config from `.env` (CORS origin, auth URLs, DB connection, cookie domain)

## Environment Variables

Production `.env` on the VPS:

```
DATABASE_URL=postgres://postgres:PASSWORD@database:5432/lila
POSTGRES_USER=postgres
POSTGRES_PASSWORD=PASSWORD
POSTGRES_DB=lila
BETTER_AUTH_SECRET=GENERATED_SECRET
BETTER_AUTH_URL=https://api.lilastudy.com
CORS_ORIGIN=https://lilastudy.com
COOKIE_DOMAIN=.lilastudy.com
GOOGLE_CLIENT_ID=...
GOOGLE_CLIENT_SECRET=...
GITHUB_CLIENT_ID=...
GITHUB_CLIENT_SECRET=...
```

Note: `DATABASE_URL` host is `database` (the Docker service name). Password in `DATABASE_URL` must match `POSTGRES_PASSWORD`.

## Docker Images — Build and Deploy

Images are built on the dev laptop with cross-compilation for ARM64, pushed to the Forgejo registry, and pulled on the VPS.

### Build (on dev laptop)

```bash
docker run --rm --privileged multiarch/qemu-user-static --reset -p yes

docker build --platform linux/arm64 \
  -t git.lilastudy.com/forgejo-lila/lila-api:latest \
  --target runner -f apps/api/Dockerfile .

docker build --platform linux/arm64 \
  -t git.lilastudy.com/forgejo-lila/lila-web:latest \
  --target production \
  --build-arg VITE_API_URL=https://api.lilastudy.com \
  -f apps/web/Dockerfile .
```

QEMU registration may need to be re-run after Docker or system restarts.

### Push (from dev laptop)

```bash
docker login git.lilastudy.com
docker push git.lilastudy.com/forgejo-lila/lila-api:latest
docker push git.lilastudy.com/forgejo-lila/lila-web:latest
```

### Deploy (on VPS)

```bash
docker compose pull
docker compose up -d
```

To deploy a single service without restarting the whole stack:

```bash
docker compose pull api
docker compose up -d api
```

### Cleanup

Remove unused images after deployments:

```bash
docker image prune -f     # safe — only removes dangling images
docker system prune -a    # aggressive — removes all unused images
```

## Dockerfiles

### API (`apps/api/Dockerfile`)

Multi-stage build: base → deps → dev → builder → runner. The `runner` stage does a fresh `pnpm install --prod` to get correct symlinks. Output is at `apps/api/dist/src/server.js` due to monorepo rootDir configuration.

### Frontend (`apps/web/Dockerfile`)

Multi-stage build: base → deps → dev → builder → production. The `builder` stage compiles with `VITE_API_URL` baked in. The `production` stage is `nginx:alpine` serving static files from `dist/`. Includes a custom `nginx.conf` for SPA fallback routing (`try_files $uri $uri/ /index.html`).

## Monorepo Package Exports

Both `packages/shared` and `packages/db` have their `exports` in `package.json` pointing to compiled JavaScript (`./dist/src/...`), not TypeScript source. This is required for production builds where Node cannot run `.ts` files. In dev, packages must be built before running the API.

## Database

### Initial Seeding

The production database was initially populated via `pg_dump` from the dev laptop:

```bash
# On dev laptop
docker exec lila-database pg_dump -U USER DB > seed.sql
scp seed.sql lila@VPS_IP:~/lila-app/

# On VPS
docker exec -i lila-database psql -U postgres -d lila < seed.sql
```

### Ongoing Data Updates

The seeding script (`packages/db/src/seeding-datafiles.ts`) uses `onConflictDoNothing()` on all inserts, making it idempotent. New vocabulary data (e.g. Spanish words) can be added by running the seeding script against production — it inserts only new records without affecting existing data or user tables.

### Schema Migrations

Schema changes are managed by Drizzle. Deploy order matters:

1. Run migration first (database gets new structure)
2. Deploy new API image (code uses new structure)

Reversing this order causes the API to crash on missing columns/tables.

## Backups

A cron job runs daily at 3:00 AM, dumping the database to a compressed SQL file and keeping the last 7 days:

```bash
# ~/backup.sh
0 3 * * * /home/lila/backup.sh
```

Backups are stored in `~/backups/` as `lila-db-YYYY-MM-DD_HHMM.sql.gz`.

### Pulling Backups to Dev Laptop

A script on the dev laptop syncs new backups on login:

```bash
# ~/pull-backups.sh (runs via .profile on login)
rsync -avz --ignore-existing --include="*.sql.gz" --exclude="*" lila@VPS_IP:~/backups/ ~/lila-backups/
```

### Restoring from Backup

```bash
gunzip -c lila-db-YYYY-MM-DD_HHMM.sql.gz | docker exec -i lila-database psql -U postgres -d lila
```

## OAuth Configuration

Google and GitHub OAuth apps must have both dev and production redirect URIs:

- **Google Cloud Console**: Authorized redirect URIs include both `http://localhost:3000/api/auth/callback/google` and `https://api.lilastudy.com/api/auth/callback/google`
- **GitHub Developer Settings**: Authorization callback URL includes both localhost and production

## Forgejo SSH

The dev laptop's `~/.ssh/config` maps `git.lilastudy.com` to port 2222:

```
Host git.lilastudy.com
    Port 2222
```

This allows standard git commands without specifying the port.

## CI/CD Pipeline

Automated build and deploy via Forgejo Actions. On every push to `main`, the pipeline builds ARM64 images natively on the VPS, pushes them to the Forgejo registry, and restarts the app containers.

### Components

- **Forgejo Actions** — enabled by default, workflow files in `.forgejo/workflows/`
- **Forgejo Runner** — runs as a container (`lila-ci-runner`) on the VPS, uses the host's Docker socket to build images natively on ARM64
- **Workflow file** — `.forgejo/workflows/deploy.yml`

### Pipeline Steps

1. Install Docker CLI and SSH client in the job container
2. Checkout the repository
3. Login to the Forgejo container registry
4. Build API image (target: `runner`)
5. Build Web image (target: `production`, with `VITE_API_URL` baked in)
6. Push both images to `git.lilastudy.com`
7. SSH into the VPS, pull new images, restart `api` and `web` containers, prune old images

### Secrets (stored in Forgejo repo settings → Actions → Secrets)

| Secret | Value |
|---|---|
| REGISTRY_USER | Forgejo username |
| REGISTRY_PASSWORD | Forgejo password |
| SSH_PRIVATE_KEY | Contents of `~/.ssh/ci-runner` on the VPS |
| SSH_HOST | VPS IP address |
| SSH_USER | `lila` |

### Runner Configuration

The runner config is at `/data/config.yml` inside the `lila-ci-runner` container. Key settings:

- `docker_host: "automount"` — mounts the host Docker socket into job containers
- `valid_volumes: ["/var/run/docker.sock"]` — allows the socket mount
- `privileged: true` — required for Docker access from job containers
- `options: "--group-add 989"` — adds the host's docker group (GID 989) to job containers

The runner command must explicitly reference the config file:

```yaml
command: '/bin/sh -c "sleep 5; forgejo-runner -c /data/config.yml daemon"'
```

### Deploy Cycle

Push to main → pipeline runs automatically (~2-5 min) → app is updated. No manual steps required.

To manually trigger a re-run: go to the repo's Actions tab, click on the latest run, and use the re-run button.

## Known Issues and Future Work

- **Backups**: Offsite backup storage (Hetzner Object Storage or similar) should be added
- **Valkey**: Not in the production stack yet. Will be added when multiplayer requires session/room state
- **Monitoring/logging**: No centralized logging or uptime monitoring configured