RnD/infra

Files

znetsixe 4117ec6063 feat(cloud): single-shot deploy.sh + FROST stack + healthchecks

Stage 5 — make the cloud composition spin up in one command and add
the SensorThings (FROST) stack as a fully segregated tenant.

cloud/deploy.sh — idempotent, 7-step bring-up:
  preflight → validate → up + wait → cert state → issue/renew →
  service status → endpoint smoke test. Reissues LE cert only when
  current issuer no longer matches ACME_CA_URI. Move-aside-then-
  restore-on-failure so the bootstrap cert survives a failed certbot.

stacks/frost — new stack, segregated from shared sql/rabbitmq:
  - dedicated postgis container (frost-db)
  - dedicated internal mosquitto bus (frost-mosquitto)
  - frost-http + frost-mqtt on a private frost-internal network,
    joined to cloud-app only for nginx ingress at frost.wbd-rd.nl
  - shared mosquitto stack deleted; rabbitmq remains the only public
    MQTT broker (mqtt.wbd-rd.nl:8883 via stream proxy)

stacks/sql — pg_isready healthcheck so keycloak/gitea/mlflow can gate
on service_healthy via cloud-level depends_on overrides.

stacks/nginx-proxy:
  - nginx-init service generates a self-signed bootstrap cert on
    fresh deploy so nginx starts before certbot has issued a real one
  - frost.wbd-rd.nl vhost (/FROST-Server → frost-http:8080,
    /mqtt → frost-mqtt:9876 WebSocket)

stacks/mlflow — custom Dockerfile (upstream + psycopg2-binary) so the
official image can speak to the shared sql backend.

stacks/jupyterhub — DummyAuthenticator stub gated by
JUPYTERHUB_ADMIN_PASSWORD; TODO comments point at OIDC + DockerSpawner.

stacks/rabbitmq — config/{enabled_plugins,rabbitmq.conf} stubs
(management + mqtt plugins, MQTT auth required).

stacks/portainer — ports unpublished; nginx now the only ingress.

stacks/node-red — pin to 4.1 (the floating "4" tag does not exist).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-21 16:37:58 +02:00

2.0 KiB

Raw Blame History

cloud

The single central hub. One deployment, internet-facing.

What runs here

nginx-proxy, wireguard-server, keycloak, portainer, influxdb, grafana, node-red, mqtt, postfix, gitea, jenkins, sql.

See ../docs/architecture.md for the full network topology and ingress table.

Run

cp .env.example .env     # fill in real secrets first
./deploy.sh              # one-shot bring-up: containers + cert + smoke test

deploy.sh is idempotent — rerun any time. It will:

Preflight — check .env has all required vars
Validate docker compose config
Bring up containers, wait for sql healthcheck, wait for nginx :80
Inspect cert — figure out whether the current cert is self-signed, staging, or prod
Issue / renew the SAN cert via certbot only when needed (initial issuance, or when ACME_CA_URI no longer matches the current issuer); reload nginx
Status — show docker compose ps
Smoke test every *.wbd-rd.nl vhost over loopback

The script reissues the cert only when the CA in .env changes (e.g. staging → prod) or when only the bootstrap dummy is present — it does not waste Let's Encrypt rate limits on repeated runs.

Staging → prod flip

Verify everything works with the staging cert (browser will warn — that's normal)
Edit .env: change ACME_CA_URI to https://acme-v02.api.letsencrypt.org/directory
./deploy.sh — script detects the CA change and force-renews against prod

Ingress (host port bindings)

Port	Container
tcp/80, 443	nginx-proxy
tcp/8883	nginx-proxy (MQTT-TLS via stream block)
udp/51820	wireguard-server

Everything else stays on the internal app / data / mgmt networks.

Adding a new stack

Create stacks/<name>/ with compose.yml, .env.example, README.md.
Uncomment (or add) the include: entry in compose.yml.
Add the stack's env vars to .env.example.
docker compose pull && docker compose up -d.

2.0 KiB Raw Blame History