Files
infra/cloud/README.md

52 lines
2.0 KiB
Markdown
Raw Normal View History

# cloud
The single central hub. One deployment, internet-facing.
## What runs here
nginx-proxy, wireguard-server, keycloak, portainer, influxdb, grafana, node-red, mqtt, postfix, gitea, jenkins, sql.
See [`../docs/architecture.md`](../docs/architecture.md) for the full network topology and ingress table.
## Run
```bash
cp .env.example .env # fill in real secrets first
feat(cloud): single-shot deploy.sh + FROST stack + healthchecks Stage 5 — make the cloud composition spin up in one command and add the SensorThings (FROST) stack as a fully segregated tenant. cloud/deploy.sh — idempotent, 7-step bring-up: preflight → validate → up + wait → cert state → issue/renew → service status → endpoint smoke test. Reissues LE cert only when current issuer no longer matches ACME_CA_URI. Move-aside-then- restore-on-failure so the bootstrap cert survives a failed certbot. stacks/frost — new stack, segregated from shared sql/rabbitmq: - dedicated postgis container (frost-db) - dedicated internal mosquitto bus (frost-mosquitto) - frost-http + frost-mqtt on a private frost-internal network, joined to cloud-app only for nginx ingress at frost.wbd-rd.nl - shared mosquitto stack deleted; rabbitmq remains the only public MQTT broker (mqtt.wbd-rd.nl:8883 via stream proxy) stacks/sql — pg_isready healthcheck so keycloak/gitea/mlflow can gate on service_healthy via cloud-level depends_on overrides. stacks/nginx-proxy: - nginx-init service generates a self-signed bootstrap cert on fresh deploy so nginx starts before certbot has issued a real one - frost.wbd-rd.nl vhost (/FROST-Server → frost-http:8080, /mqtt → frost-mqtt:9876 WebSocket) stacks/mlflow — custom Dockerfile (upstream + psycopg2-binary) so the official image can speak to the shared sql backend. stacks/jupyterhub — DummyAuthenticator stub gated by JUPYTERHUB_ADMIN_PASSWORD; TODO comments point at OIDC + DockerSpawner. stacks/rabbitmq — config/{enabled_plugins,rabbitmq.conf} stubs (management + mqtt plugins, MQTT auth required). stacks/portainer — ports unpublished; nginx now the only ingress. stacks/node-red — pin to 4.1 (the floating "4" tag does not exist). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 16:37:58 +02:00
./deploy.sh # one-shot bring-up: containers + cert + smoke test
```
feat(cloud): single-shot deploy.sh + FROST stack + healthchecks Stage 5 — make the cloud composition spin up in one command and add the SensorThings (FROST) stack as a fully segregated tenant. cloud/deploy.sh — idempotent, 7-step bring-up: preflight → validate → up + wait → cert state → issue/renew → service status → endpoint smoke test. Reissues LE cert only when current issuer no longer matches ACME_CA_URI. Move-aside-then- restore-on-failure so the bootstrap cert survives a failed certbot. stacks/frost — new stack, segregated from shared sql/rabbitmq: - dedicated postgis container (frost-db) - dedicated internal mosquitto bus (frost-mosquitto) - frost-http + frost-mqtt on a private frost-internal network, joined to cloud-app only for nginx ingress at frost.wbd-rd.nl - shared mosquitto stack deleted; rabbitmq remains the only public MQTT broker (mqtt.wbd-rd.nl:8883 via stream proxy) stacks/sql — pg_isready healthcheck so keycloak/gitea/mlflow can gate on service_healthy via cloud-level depends_on overrides. stacks/nginx-proxy: - nginx-init service generates a self-signed bootstrap cert on fresh deploy so nginx starts before certbot has issued a real one - frost.wbd-rd.nl vhost (/FROST-Server → frost-http:8080, /mqtt → frost-mqtt:9876 WebSocket) stacks/mlflow — custom Dockerfile (upstream + psycopg2-binary) so the official image can speak to the shared sql backend. stacks/jupyterhub — DummyAuthenticator stub gated by JUPYTERHUB_ADMIN_PASSWORD; TODO comments point at OIDC + DockerSpawner. stacks/rabbitmq — config/{enabled_plugins,rabbitmq.conf} stubs (management + mqtt plugins, MQTT auth required). stacks/portainer — ports unpublished; nginx now the only ingress. stacks/node-red — pin to 4.1 (the floating "4" tag does not exist). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 16:37:58 +02:00
`deploy.sh` is idempotent — rerun any time. It will:
1. **Preflight** — check `.env` has all required vars
2. **Validate** `docker compose config`
3. **Bring up** containers, wait for `sql` healthcheck, wait for nginx :80
4. **Inspect cert** — figure out whether the current cert is self-signed, staging, or prod
5. **Issue / renew** the SAN cert via certbot only when needed (initial issuance, or when `ACME_CA_URI` no longer matches the current issuer); reload nginx
6. **Status** — show `docker compose ps`
7. **Smoke test** every `*.wbd-rd.nl` vhost over loopback
The script reissues the cert **only** when the CA in `.env` changes (e.g. staging → prod) or when only the bootstrap dummy is present — it does not waste Let's Encrypt rate limits on repeated runs.
### Staging → prod flip
1. Verify everything works with the staging cert (browser will warn — that's normal)
2. Edit `.env`: change `ACME_CA_URI` to `https://acme-v02.api.letsencrypt.org/directory`
3. `./deploy.sh` — script detects the CA change and force-renews against prod
## Ingress (host port bindings)
| Port | Container |
|---|---|
| tcp/80, 443 | nginx-proxy |
| tcp/8883 | nginx-proxy (MQTT-TLS via stream block) |
| udp/51820 | wireguard-server |
Everything else stays on the internal `app` / `data` / `mgmt` networks.
## Adding a new stack
1. Create `stacks/<name>/` with `compose.yml`, `.env.example`, `README.md`.
2. Uncomment (or add) the `include:` entry in `compose.yml`.
3. Add the stack's env vars to `.env.example`.
4. `docker compose pull && docker compose up -d`.