feat(cloud): single-shot deploy.sh + FROST stack + healthchecks

Stage 5 — make the cloud composition spin up in one command and add
the SensorThings (FROST) stack as a fully segregated tenant.

cloud/deploy.sh — idempotent, 7-step bring-up:
  preflight → validate → up + wait → cert state → issue/renew →
  service status → endpoint smoke test. Reissues LE cert only when
  current issuer no longer matches ACME_CA_URI. Move-aside-then-
  restore-on-failure so the bootstrap cert survives a failed certbot.

stacks/frost — new stack, segregated from shared sql/rabbitmq:
  - dedicated postgis container (frost-db)
  - dedicated internal mosquitto bus (frost-mosquitto)
  - frost-http + frost-mqtt on a private frost-internal network,
    joined to cloud-app only for nginx ingress at frost.wbd-rd.nl
  - shared mosquitto stack deleted; rabbitmq remains the only public
    MQTT broker (mqtt.wbd-rd.nl:8883 via stream proxy)

stacks/sql — pg_isready healthcheck so keycloak/gitea/mlflow can gate
on service_healthy via cloud-level depends_on overrides.

stacks/nginx-proxy:
  - nginx-init service generates a self-signed bootstrap cert on
    fresh deploy so nginx starts before certbot has issued a real one
  - frost.wbd-rd.nl vhost (/FROST-Server → frost-http:8080,
    /mqtt → frost-mqtt:9876 WebSocket)

stacks/mlflow — custom Dockerfile (upstream + psycopg2-binary) so the
official image can speak to the shared sql backend.

stacks/jupyterhub — DummyAuthenticator stub gated by
JUPYTERHUB_ADMIN_PASSWORD; TODO comments point at OIDC + DockerSpawner.

stacks/rabbitmq — config/{enabled_plugins,rabbitmq.conf} stubs
(management + mqtt plugins, MQTT auth required).

stacks/portainer — ports unpublished; nginx now the only ingress.

stacks/node-red — pin to 4.1 (the floating "4" tag does not exist).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
znetsixe
2026-05-21 16:37:58 +02:00
parent 035ac757ae
commit 4117ec6063
25 changed files with 660 additions and 95 deletions

1
.gitignore vendored
View File

@@ -14,6 +14,7 @@ Thumbs.db
*~
.vscode/
.idea/
.claude/
# Logs
*.log

View File

@@ -47,7 +47,7 @@ docker compose up -d
| sql | Config DB (postgres 16) | ✓ | — |
| mlflow | ML experiment tracking + registry | ✓ | — |
| jupyterhub | Multi-user notebook server | ✓ | — |
| mosquitto | MQTT broker for FROST stack only | ✓ | — |
| frost | OGC SensorThings API (postgis + dedicated bus) | ✓ | — |
## Sites

View File

@@ -68,5 +68,12 @@ MLFLOW_DB_USER=mlflow
MLFLOW_DB_PASSWORD=
# JupyterHub
# STUB AUTH: DummyAuthenticator. Set a strong shared password — any username + this password logs in.
# Replace with Keycloak OIDC (GenericOAuthenticator) before exposing to users beyond the cloud operator.
JUPYTER_NOTEBOOK_IMAGE=jupyter/datascience-notebook:latest
JUPYTERHUB_ADMIN_USERS=
JUPYTERHUB_ADMIN_PASSWORD=
# FROST (SensorThings — dedicated postgis + internal mosquitto bus, ingressed at frost.wbd-rd.nl)
FROST_DB_PASSWORD=
FROST_SERVICE_ROOT_URL=https://frost.wbd-rd.nl/FROST-Server

View File

@@ -12,10 +12,27 @@ See [`../docs/architecture.md`](../docs/architecture.md) for the full network to
```bash
cp .env.example .env # fill in real secrets first
docker compose up -d
docker compose ps
./deploy.sh # one-shot bring-up: containers + cert + smoke test
```
`deploy.sh` is idempotent — rerun any time. It will:
1. **Preflight** — check `.env` has all required vars
2. **Validate** `docker compose config`
3. **Bring up** containers, wait for `sql` healthcheck, wait for nginx :80
4. **Inspect cert** — figure out whether the current cert is self-signed, staging, or prod
5. **Issue / renew** the SAN cert via certbot only when needed (initial issuance, or when `ACME_CA_URI` no longer matches the current issuer); reload nginx
6. **Status** — show `docker compose ps`
7. **Smoke test** every `*.wbd-rd.nl` vhost over loopback
The script reissues the cert **only** when the CA in `.env` changes (e.g. staging → prod) or when only the bootstrap dummy is present — it does not waste Let's Encrypt rate limits on repeated runs.
### Staging → prod flip
1. Verify everything works with the staging cert (browser will warn — that's normal)
2. Edit `.env`: change `ACME_CA_URI` to `https://acme-v02.api.letsencrypt.org/directory`
3. `./deploy.sh` — script detects the CA change and force-renews against prod
## Ingress (host port bindings)
| Port | Container |

View File

@@ -1,40 +1,60 @@
# Cloud / Central layer composition.
# Includes all cloud-relevant stacks and defines the 4-network topology.
# Run: cp .env.example .env && docker compose up -d
# Pulls in every stack that runs on the central hub and adds cross-stack
# dependencies (the per-stack composes stay standalone-runnable).
#
# Fresh-deploy procedure (see ../docs/architecture.md for the long version):
# 1. cp .env.example .env && fill secrets
# 2. Set DNS A records for the 10 short subdomains + vpn.wbd-rd.nl
# 3. docker compose up -d
# - nginx-init creates a self-signed bootstrap cert
# - sql comes up, init.d/01-databases.sh provisions per-app DBs
# - keycloak / gitea / mlflow wait on sql healthcheck before starting
# 4. ./deploy.sh — single command. Brings everything up, runs first-time cert
# issuance via certbot HTTP-01 (SAN over all *.wbd-rd.nl), reloads nginx,
# smoke-tests every vhost. Idempotent; safe to rerun.
# 5. Flip ACME_CA_URI from staging → prod in .env, ./deploy.sh again.
name: cloud
# Uncomment includes as each stack is hardened beyond stub.
include:
# Foundation (round 3) — ingress, auth backing store, ops console
# Foundation — ingress, DB, ops console
- ../stacks/nginx-proxy/compose.yml
- ../stacks/sql/compose.yml
- ../stacks/portainer/compose.yml
# Core identity + VPN
# - ../stacks/wireguard-server/compose.yml
# Identity + VPN
- ../stacks/keycloak/compose.yml
- ../stacks/wireguard-server/compose.yml
# Data
# - ../stacks/influxdb/compose.yml
- ../stacks/influxdb/compose.yml
# Apps
# - ../stacks/node-red/compose.yml
# - ../stacks/grafana/compose.yml
- ../stacks/node-red/compose.yml
- ../stacks/grafana/compose.yml
- ../stacks/gitea/compose.yml
# - ../stacks/jenkins/compose.yml
- ../stacks/jenkins/compose.yml
# Messaging + mail
# - ../stacks/rabbitmq/compose.yml
# - ../stacks/postfix/compose.yml
- ../stacks/rabbitmq/compose.yml
- ../stacks/postfix/compose.yml
# ML / notebooks
# - ../stacks/mlflow/compose.yml
# - ../stacks/jupyterhub/compose.yml
# FROST (when deployed)
# - ../stacks/mosquitto/compose.yml
- ../stacks/mlflow/compose.yml
- ../stacks/jupyterhub/compose.yml
# SensorThings
- ../stacks/frost/compose.yml
# NOTE on portainer transition:
# The portainer stack publishes 9443+8000 for standalone first-run use.
# When bringing it up through this cloud compose, take the standalone
# instance down first (`cd stacks/portainer && docker compose down`) and
# comment out the `ports:` block in stacks/portainer/compose.yml so
# nginx-proxy is the only ingress. Access then via https://portainer.wbd-rd.nl/.
# Cross-stack dependencies. Declared at the cloud level so each stack's
# own compose.yml stays standalone-runnable (no required peers).
services:
keycloak:
depends_on:
sql:
condition: service_healthy
gitea:
depends_on:
sql:
condition: service_healthy
mlflow:
depends_on:
sql:
condition: service_healthy
networks:
edge:

248
cloud/deploy.sh Normal file
View File

@@ -0,0 +1,248 @@
#!/usr/bin/env bash
# cloud/deploy.sh — one-shot bring-up for the cloud composition.
#
# Idempotent. Safe to rerun. Will reissue the Let's Encrypt cert only when:
# - the current cert is the self-signed bootstrap dummy, or
# - .env's ACME_CA_URI no longer matches the issuer of the current cert
# (e.g. you flipped staging → prod).
#
# Usage:
# cd cloud && ./deploy.sh
set -euo pipefail
cd "$(dirname "$0")"
# ---------- UI ----------
if [ -t 1 ]; then
B=$'\e[34m'; G=$'\e[32m'; Y=$'\e[33m'; R=$'\e[31m'; D=$'\e[2m'; N=$'\e[0m'
else
B=""; G=""; Y=""; R=""; D=""; N=""
fi
STEP=0; TOTAL=7
step() { STEP=$((STEP+1)); printf "\n${B}[%d/%d]${N} %s\n" "$STEP" "$TOTAL" "$*"; }
ok() { printf " ${G}[OK]${N} %s\n" "$*"; }
info() { printf " ${D}...${N} %s\n" "$*"; }
warn() { printf " ${Y}[!]${N} %s\n" "$*"; }
fail() { printf " ${R}[X]${N} %s\n" "$*"; }
die() { fail "$*"; exit 1; }
trap 'rc=$?; [ "$rc" -ne 0 ] && printf "\n${R}DEPLOY FAILED${N} (exit $rc) at step $STEP/$TOTAL\n"' EXIT
# Subdomains covered by the SAN cert (kept in lock-step with nginx-proxy vhosts)
HOSTS=(
git.wbd-rd.nl auth.wbd-rd.nl dash.wbd-rd.nl flow.wbd-rd.nl
ml.wbd-rd.nl hub.wbd-rd.nl ops.wbd-rd.nl mq.wbd-rd.nl
ci.wbd-rd.nl mqtt.wbd-rd.nl portainer.wbd-rd.nl
frost.wbd-rd.nl
)
# ---------- 1. Preflight ----------
step "Preflight"
[ -f .env ] || die ".env missing (cp .env.example .env and fill secrets)"
ok ".env present"
command -v docker >/dev/null || die "docker not installed"
docker compose version >/dev/null 2>&1 || die "docker compose plugin missing"
ok "docker $(docker --version | awk '{print $3}' | tr -d ,)"
ok "docker compose $(docker compose version --short)"
# Source .env so we can read variables (without leaking to the wider env)
set -a; . ./.env; set +a
REQUIRED=(
LETSENCRYPT_EMAIL ACME_CA_URI
KEYCLOAK_ADMIN_PASSWORD KEYCLOAK_DB_PASSWORD
SQL_PASSWORD
GITEA_DB_PASSWORD GITEA_OAUTH_CLIENT_SECRET
GRAFANA_ADMIN_PASSWORD
INFLUX_ADMIN_PASSWORD INFLUX_ADMIN_TOKEN
RABBITMQ_PASSWORD
JENKINS_ADMIN_PASSWORD
MLFLOW_DB_PASSWORD
JUPYTERHUB_ADMIN_PASSWORD
FROST_DB_PASSWORD
WG_SERVER_PUBLIC_HOST
)
missing=0
for v in "${REQUIRED[@]}"; do
if [ -z "${!v:-}" ]; then warn "\$$v is empty in .env"; missing=$((missing+1)); fi
done
[ "$missing" -eq 0 ] || die "$missing required env var(s) empty"
ok "required env vars present"
# ---------- 2. Validate compose ----------
step "Validate compose"
docker compose config --quiet || die "docker compose config invalid"
services_total=$(docker compose config --services | wc -l)
ok "compose valid, $services_total services defined"
# ---------- 3. Bring up containers ----------
step "Bring up containers (docker compose up -d)"
docker compose up -d --remove-orphans
ok "containers requested"
# Wait for postgres healthy (longest dep — gates keycloak/gitea/mlflow)
info "waiting for sql to become healthy ..."
sql_cid=$(docker compose ps -q sql)
[ -n "$sql_cid" ] || die "sql container not found"
for i in $(seq 1 60); do
state=$(docker inspect --format '{{if .State.Health}}{{.State.Health.Status}}{{else}}none{{end}}' "$sql_cid" 2>/dev/null || echo "")
case "$state" in
healthy) ok "sql healthy (after ${i} probe(s))"; break;;
starting|"") sleep 2;;
unhealthy) die "sql reports unhealthy — check 'docker compose logs sql'";;
none) warn "sql has no healthcheck — proceeding anyway"; break;;
esac
[ "$i" -eq 60 ] && die "sql did not become healthy within 120s"
done
# Wait for nginx accepting on :80 (nginx-init must have produced the bootstrap cert)
info "waiting for nginx :80 ..."
for i in $(seq 1 30); do
code=$(curl -s -o /dev/null -w '%{http_code}' --max-time 2 -H "Host: ping" http://127.0.0.1/ 2>/dev/null || echo 000)
if [ "$code" != "000" ]; then ok "nginx :80 responding (HTTP $code) after ${i} probe(s)"; break; fi
sleep 2
[ "$i" -eq 30 ] && die "nginx :80 unreachable — check 'docker compose logs nginx-init nginx'"
done
# ---------- 4. Detect cert state ----------
step "Inspect TLS cert state"
CERT_PATH=/etc/letsencrypt/live/infra/fullchain.pem
# nginx:1.27-alpine doesn't ship openssl; the certbot image does.
cert_subj=$(docker compose run --rm --entrypoint openssl certbot \
x509 -in "$CERT_PATH" -noout -subject 2>/dev/null || echo "")
cert_iss=$(docker compose run --rm --entrypoint openssl certbot \
x509 -in "$CERT_PATH" -noout -issuer 2>/dev/null || echo "")
case "$ACME_CA_URI" in
*acme-staging*) want_ca=STAGING;;
*) want_ca=PROD;;
esac
cur_ca=UNKNOWN
case "$cert_subj" in *bootstrap-infra*) cur_ca=SELFSIGNED;; esac
if [ "$cur_ca" = "UNKNOWN" ]; then
case "$cert_iss" in
*STAGING*|*Fake*|*staging*) cur_ca=STAGING;;
*Encrypt*|*ISRG*|*\ R3*|*\ R10*|*\ R11*|*\ E1*|*\ E5*|*\ E6*) cur_ca=PROD;;
esac
fi
ok "current cert: $cur_ca / target: $want_ca"
# Decide what to do: 'none', 'initial' (no certbot lineage yet), or 'renew' (lineage exists but wrong CA).
if [ "$cur_ca" = "$want_ca" ]; then
action="none"; reason=""
elif [ "$cur_ca" = "SELFSIGNED" ] || [ "$cur_ca" = "UNKNOWN" ]; then
action="initial"; reason="bootstrap → $want_ca"
else
action="renew"; reason="$cur_ca$want_ca"
fi
# ---------- 5. Issue / renew cert ----------
step "Cert issuance"
if [ "$action" = "none" ]; then
ok "no issuance needed (cert matches ACME_CA_URI)"
else
warn "$reason"
d_args=()
for h in "${HOSTS[@]}"; do d_args+=(-d "$h"); done
# For 'initial': move the bootstrap dummy aside into a backup location so certbot
# can create a fresh lineage. Restore from backup if certbot fails so nginx
# always has *some* cert available on the next restart.
if [ "$action" = "initial" ]; then
info "moving bootstrap cert aside before issuance ..."
docker compose run --rm --entrypoint sh certbot -c '
set -e
mkdir -p /etc/letsencrypt/_backup
rm -rf /etc/letsencrypt/_backup/*
for p in live/infra archive/infra renewal/infra.conf; do
[ -e "/etc/letsencrypt/$p" ] && mv "/etc/letsencrypt/$p" "/etc/letsencrypt/_backup/$(echo $p | tr / -)" || true
done
' >/dev/null
force_flag=""
else
force_flag="--force-renewal"
fi
if docker compose run --rm --entrypoint certbot certbot \
certonly --webroot -w /var/www/certbot \
--server "$ACME_CA_URI" \
--email "$LETSENCRYPT_EMAIL" --agree-tos --no-eff-email \
--cert-name infra --non-interactive --keep-until-expiring \
$force_flag \
"${d_args[@]}"; then
ok "cert issued by $want_ca CA"
# Issuance OK: discard backup
if [ "$action" = "initial" ]; then
docker compose run --rm --entrypoint sh certbot -c \
"rm -rf /etc/letsencrypt/_backup" >/dev/null
fi
docker compose exec -T nginx nginx -s reload
ok "nginx reloaded with new cert"
else
# Restore backup so nginx still has a working cert next time it restarts
if [ "$action" = "initial" ]; then
warn "restoring bootstrap cert after failed issuance ..."
docker compose run --rm --entrypoint sh certbot -c '
for f in /etc/letsencrypt/_backup/*; do
[ -e "$f" ] || continue
dest=/etc/letsencrypt/$(basename "$f" | sed "s/-/\//")
mkdir -p "$(dirname "$dest")"
mv "$f" "$dest"
done
rmdir /etc/letsencrypt/_backup 2>/dev/null || true
' >/dev/null
fi
die "certbot failed — DNS A records pointing at this host?"
fi
fi
# ---------- 6. Service status ----------
step "Service status"
running=0; total=0
while IFS= read -r line; do
total=$((total+1))
case "$line" in *running*|*healthy*) running=$((running+1));; esac
done < <(docker compose ps --format '{{.Name}} {{.Status}}')
docker compose ps --format 'table {{.Name}}\t{{.Status}}' | sed 's/^/ /'
ok "$running/$total containers running"
# ---------- 7. Endpoint smoke test ----------
step "Endpoint smoke test (loopback)"
reachable=0; unreachable=0
for h in "${HOSTS[@]}"; do
code=$(curl -sk -o /dev/null -w '%{http_code}' --max-time 5 \
--resolve "$h:443:127.0.0.1" "https://$h/" 2>/dev/null || echo 000)
case "$code" in
2*|3*) ok "$h → HTTP $code"; reachable=$((reachable+1));;
4*) ok "$h → HTTP $code (auth gate — vhost OK)"; reachable=$((reachable+1));;
5*) warn "$h → HTTP $code (vhost OK, upstream not ready)"; reachable=$((reachable+1));;
000) fail "$h → unreachable"; unreachable=$((unreachable+1));;
*) warn "$h → HTTP $code"; reachable=$((reachable+1));;
esac
done
# ---------- Summary ----------
echo
if [ "$unreachable" -eq 0 ] && [ "$running" -eq "$total" ]; then
printf "${G}DEPLOY OK${N}$running/$total containers, $reachable/${#HOSTS[@]} endpoints reachable, cert: $want_ca\n"
else
printf "${Y}DEPLOY COMPLETED WITH WARNINGS${N}$running/$total containers, $unreachable unreachable endpoint(s)\n"
fi
if [ "$want_ca" = "STAGING" ]; then
printf "\n${D}Next: when staging looks right, flip ACME_CA_URI to the prod URL in .env and rerun this script.${N}\n"
fi
trap - EXIT

View File

@@ -22,7 +22,7 @@ R&D infrastructure for Waterschap Brabantse Delta. Hub-and-spoke topology:
│ rabbitmq, postfix, portainer │
│ sql (postgres, single config) │
│ mlflow, jupyterhub │
mosquitto (FROST stack only)
frost (SensorThings API)
└───────────────┬────────────────────┘
│ WireGuard tunnels
┌───────┼────────┬───────────┐
@@ -56,13 +56,14 @@ Each layer uses **four internal Docker networks**:
```
edge : nginx, wireguard-server
app : nginx, rabbitmq, postfix, node-red, grafana,
jenkins, gitea, keycloak, mlflow, jupyterhub
jenkins, gitea, keycloak, mlflow, jupyterhub,
portainer, frost-http, frost-mqtt
data : influxdb, sql, grafana, mlflow
mgmt : portainer, keycloak, wireguard-server, jupyterhub
frost-internal (private to frost stack) :
frost-db (postgis), frost-mosquitto, frost-http, frost-mqtt
```
(`mosquitto` joins `app` only when the FROST stack is deployed.)
### Edge attachments
```
@@ -117,12 +118,13 @@ For cloud-internal hostnames not reachable via Let's Encrypt HTTP-01, the longer
Postfix is **outbound-only**. It initiates SMTP to internet MX servers but accepts no inbound. Zero ingress, no published port, no listener facing internet. Just needs egress (every container has it via host NAT).
### MQTT — two brokers
### MQTT — RabbitMQ for public traffic, dedicated mosquitto inside FROST
- **RabbitMQ** is the **general-purpose** broker. Runs at both cloud and edge. MQTT plugin enabled. Cloud-side reachable externally via nginx stream proxy on `tcp/8883`. Edge-side fully internal.
- **Mosquitto** is reserved for the **FROST (SensorThings API) stack** only — cloud-only. Internal to its own stack — no external ingress unless FROST publishers need to push from outside (in which case use a separate stream block on a different port).
- **RabbitMQ** is the **only public MQTT broker**. SCADA / IoT / edge clients connect to `mqtt.wbd-rd.nl:8883` (TLS, via nginx `stream {}` block proxying to `rabbitmq:1883`). Authentication uses the standard RABBITMQ_USER/PASS.
- **frost-mosquitto** lives **inside the frost stack** on the private `frost-internal` docker network — it is purely the message bus between `frost-http` and `frost-mqtt`. It is not reachable from anywhere outside the frost stack.
- SensorThings-protocol MQTT (the FROST native MQTT API) is exposed to clients via `frost-mqtt`'s WebSocket port, proxied as `https://frost.wbd-rd.nl/mqtt`.
If FROST needs cross-broker forwarding, add a RabbitMQ `shovel` plugin pointing at `mosquitto`. Not wired up by default.
If FROST consumers also need to see SCADA traffic on RabbitMQ, add a RabbitMQ `shovel` plugin pointing into the frost stack. Not wired up by default.
### Gitea — HTTPS only
@@ -137,7 +139,7 @@ WireGuard is connectionless UDP with crypto-routed packets. Proxying through ngi
The repo defines **15 stacks** under `stacks/`:
- **Cloud + edge**: `nginx-proxy`, `node-red`, `influxdb`, `grafana`, `keycloak`, `portainer`, `rabbitmq`, `postfix`
- **Cloud-only**: `wireguard-server`, `gitea` (HTTPS), `jenkins`, `sql` (postgres), `mlflow`, `jupyterhub`, `mosquitto` (FROST)
- **Cloud-only**: `wireguard-server`, `gitea` (HTTPS), `jenkins`, `sql` (postgres), `mlflow`, `jupyterhub`, `frost` (SensorThings, dedicated postgis + internal bus)
- **Edge-only**: `wireguard-client`
## Sites
@@ -164,6 +166,7 @@ Tracked here so we don't forget. Each lands when we harden the relevant stack.
- **MinIO / artifact store** — MLflow uses local volume for now; switch to S3-compatible MinIO sidecar when artifacts grow.
- **JupyterHub auth** — target Keycloak OIDC via `oauthenticator.generic.GenericOAuthenticator`.
- **WG client routing** — split-tunnel vs full; per-peer `AllowedIPs` policy.
- **FROST auth** — currently `BasicAuthProvider` against the USERS table in `frost-db`; swap to Keycloak OIDC via the FROST OIDC plugin when SSO is rolled out.
- **MQTT cross-broker shovel** — only if FROST consumers must see RabbitMQ traffic or vice versa.
- **Internal PKI** — for cloud-internal hostnames not eligible for Let's Encrypt HTTP-01.
- **Backup strategy** — for `sql` (postgres), `influxdb`, `gitea-data`, `jenkins-home`, `mlflow-artifacts`.

42
stacks/frost/README.md Normal file
View File

@@ -0,0 +1,42 @@
# frost
[FROST-Server](https://github.com/FraunhoferIOSB/FROST-Server) — an OGC SensorThings API server. Stores sensors, observations, datastreams in postgis; exposes REST + MQTT.
- **Public hostname**: `frost.wbd-rd.nl`
- `/FROST-Server` → REST + admin UI (frost-http:8080)
- `/mqtt` → WebSocket MQTT for SensorThings clients (frost-mqtt:9876)
- **Networks**: `frost-internal` (private bus) + `app` (nginx ingress)
- **Backend**: dedicated `postgis/postgis:16-3.4-alpine` container — segregated from the shared `sql` stack
- **Internal bus**: dedicated `eclipse-mosquitto` for frost-http ↔ frost-mqtt sync (not reachable from outside the stack)
- **Public MQTT broker for SCADA/IoT clients**: that's `rabbitmq` (port 8883 TLS via nginx stream), NOT this stack
## Volumes (persistent)
- `frost-db-data` — postgis data dir
- `frost-mosquitto-data`, `frost-mosquitto-log` — internal bus state
Container can be recreated freely; no data loss as long as volumes are kept.
## First-run
1. `docker compose up -d frost-db frost-mosquitto` (or just `up -d` for the full stack — frost-http waits on the db healthcheck)
2. `frost-http` will auto-create the schema (`persistence_autoUpdateDatabase=true`) on first start
3. Create the admin user (one-time, post-deploy — the USERS table is created by FROST itself):
```bash
docker compose exec frost-db psql -U sensorthings -d sensorthings -c \
"INSERT INTO \"USERS\" (\"USER_NAME\", \"USER_PASS\") VALUES ('admin', crypt('CHANGE_ME', gen_salt('bf', 12)));"
```
Subsequent password rotations:
```bash
docker compose exec frost-db psql -U sensorthings -d sensorthings -c \
"UPDATE \"USERS\" SET \"USER_PASS\"=crypt('NEW_PW', gen_salt('bf', 12)) WHERE \"USER_NAME\"='admin';"
```
## TODO
- Switch from `BasicAuthProvider` to Keycloak OIDC (FROST has a plugin)
- Bootstrap admin user automatically (post-init container that waits for FROST schema, then runs the SQL above with `${FROST_ADMIN_PASSWORD}`)
- Document the SensorThings client examples (Things, Datastreams, Observations)
- pgadmin / db inspection: use shared portainer or a one-off `psql` exec

130
stacks/frost/compose.yml Normal file
View File

@@ -0,0 +1,130 @@
# frost — FROST-Server (OGC SensorThings API) (cloud only)
# Public hostname: frost.wbd-rd.nl (reverse-proxied via nginx-proxy)
# /FROST-Server → frost-http:8080 (REST + UI)
# /mqtt → frost-mqtt:9876 (WebSocket MQTT for STA clients)
#
# Networks:
# frost-internal : private bus (db ↔ frost-* ↔ mosquitto). No outside reach.
# app : where frost-http / frost-mqtt expose ports to nginx.
#
# Dedicated postgis DB + dedicated mosquitto bus — maximum segregation from the
# shared sql / rabbitmq stacks. Public MQTT for SCADA clients goes via rabbitmq.
services:
# --- DB: dedicated postgis instance for FROST ---------------------------------
frost-db:
image: postgis/postgis:16-3.4-alpine
restart: unless-stopped
networks: [frost-internal]
environment:
POSTGRES_DB: sensorthings
POSTGRES_USER: sensorthings
POSTGRES_PASSWORD: ${FROST_DB_PASSWORD}
TZ: ${TZ:-Europe/Amsterdam}
volumes:
- frost-db-data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -d sensorthings -U sensorthings"]
interval: 5s
timeout: 5s
retries: 12
start_period: 30s
# --- Internal message bus ------------------------------------------------------
# Used solely for frost-http ↔ frost-mqtt synchronisation. Not for external clients.
frost-mosquitto:
image: eclipse-mosquitto:2.0
restart: unless-stopped
networks: [frost-internal]
volumes:
- ./config/mosquitto.conf:/mosquitto/config/mosquitto.conf:ro
- frost-mosquitto-data:/mosquitto/data
- frost-mosquitto-log:/mosquitto/log
# --- HTTP (REST + admin UI) ---------------------------------------------------
frost-http:
image: fraunhoferiosb/frost-server-http:latest
restart: unless-stopped
networks: [frost-internal, app]
depends_on:
frost-db:
condition: service_healthy
frost-mosquitto:
condition: service_started
environment:
serviceRootUrl: ${FROST_SERVICE_ROOT_URL:-https://frost.wbd-rd.nl/FROST-Server}
queueLoggingInterval: "1000"
plugins_multiDatastream_enable: "false"
http_cors_enable: "true"
http_cors_allowed_origins: "*"
bus_busImplementationClass: de.fraunhofer.iosb.ilt.frostserver.messagebus.MqttMessageBus
bus_mqttBroker: tcp://frost-mosquitto:1883
bus_sendQueueSize: "2000"
bus_sendWorkerPoolSize: "10"
bus_maxInFlight: "2000"
persistence_db_driver: org.postgresql.Driver
persistence_db_url: jdbc:postgresql://frost-db:5432/sensorthings
persistence_db_username: sensorthings
persistence_db_password: ${FROST_DB_PASSWORD}
persistence_autoUpdateDatabase: "true"
# BasicAuth against USERS table in postgis. Swap to Keycloak OIDC later.
auth_provider: de.fraunhofer.iosb.ilt.frostserver.auth.basic.BasicAuthProvider
auth_db_driver: org.postgresql.Driver
auth_db_url: jdbc:postgresql://frost-db:5432/sensorthings
auth_db_username: sensorthings
auth_db_password: ${FROST_DB_PASSWORD}
auth_plainTextPassword: "false"
auth_autoUpdateDatabase: "true"
TZ: ${TZ:-Europe/Amsterdam}
# --- MQTT (SensorThings MQTT endpoint, with WebSocket on 9876) -----------------
frost-mqtt:
image: fraunhoferiosb/frost-server-mqtt:latest
restart: unless-stopped
networks: [frost-internal, app]
depends_on:
frost-db:
condition: service_healthy
frost-mosquitto:
condition: service_started
environment:
serviceRootUrl: ${FROST_SERVICE_ROOT_URL:-https://frost.wbd-rd.nl/FROST-Server}
queueLoggingInterval: "1000"
plugins_multiDatastream_enable: "false"
bus_busImplementationClass: de.fraunhofer.iosb.ilt.frostserver.messagebus.MqttMessageBus
bus_mqttBroker: tcp://frost-mosquitto:1883
mqtt_CreateThreadPoolSize: "10"
mqtt_CreateMessageQueueSize: "10000"
mqtt_SubscribeThreadPoolSize: "20"
mqtt_SubscribeMessageQueueSize: "10000"
persistence_persistenceManagerImplementationClass: de.fraunhofer.iosb.ilt.sta.persistence.postgres.PostgresPersistenceManager
persistence_db_driver: org.postgresql.Driver
persistence_db_url: jdbc:postgresql://frost-db:5432/sensorthings
persistence_db_username: sensorthings
persistence_db_password: ${FROST_DB_PASSWORD}
auth_provider: de.fraunhofer.iosb.ilt.frostserver.auth.basic.BasicAuthProvider
auth_db_driver: org.postgresql.Driver
auth_db_url: jdbc:postgresql://frost-db:5432/sensorthings
auth_db_username: sensorthings
auth_db_password: ${FROST_DB_PASSWORD}
auth_plainTextPassword: "false"
auth_autoUpdateDatabase: "true"
TZ: ${TZ:-Europe/Amsterdam}
networks:
app:
frost-internal:
driver: bridge
internal: true
volumes:
frost-db-data:
frost-mosquitto-data:
frost-mosquitto-log:

View File

@@ -0,0 +1,13 @@
# Internal FROST message bus. Reached only from frost-http / frost-mqtt over the
# frost-internal docker network. No external listener, no auth needed.
listener 1883
allow_anonymous true
persistence true
persistence_location /mosquitto/data/
log_dest stdout
log_type error
log_type warning
log_type notice

View File

@@ -14,6 +14,9 @@ services:
TZ: ${TZ:-Europe/Amsterdam}
DOCKER_NOTEBOOK_IMAGE: ${JUPYTER_NOTEBOOK_IMAGE:-jupyter/datascience-notebook:latest}
DOCKER_NETWORK_NAME: cloud-app
# Stub auth — DummyAuthenticator gates on this shared password until OIDC is wired.
JUPYTERHUB_ADMIN_PASSWORD: ${JUPYTERHUB_ADMIN_PASSWORD}
JUPYTERHUB_ADMIN_USERS: ${JUPYTERHUB_ADMIN_USERS}
# TODO: DockerSpawner config in jupyterhub_config.py; Keycloak OAuthAuthenticator;
# preinstalled libraries; per-user persistent volumes; CPU/memory limits

View File

@@ -0,0 +1,33 @@
# JupyterHub bootstrap config.
#
# WARNING: this is a STUB. It uses DummyAuthenticator with a single shared
# password and LocalProcessSpawner. It boots, it's password-gated, but it is
# NOT the production setup. Before exposing this to anything beyond the
# cloud-host operator, swap to:
# - GenericOAuthenticator pointed at Keycloak (wbd realm, jupyterhub client)
# - DockerSpawner with per-user persistent volumes
# See stacks/jupyterhub/README.md TODO.
import os
c = get_config() # noqa: F821 — provided by JupyterHub
# --- Authenticator (stub) -----------------------------------------------------
c.JupyterHub.authenticator_class = "dummy"
c.DummyAuthenticator.password = os.environ["JUPYTERHUB_ADMIN_PASSWORD"]
admin_users = os.environ.get("JUPYTERHUB_ADMIN_USERS", "").strip()
if admin_users:
c.Authenticator.admin_users = {u.strip() for u in admin_users.split(",") if u.strip()}
c.Authenticator.allow_all = True # stub: any username, single shared password
# --- Spawner (stub) -----------------------------------------------------------
# LocalProcessSpawner runs notebooks as OS processes inside the hub container.
# Fine for a single operator on the stub; production should use DockerSpawner.
c.JupyterHub.spawner_class = "simple"
c.Spawner.default_url = "/lab"
# --- Hub posture --------------------------------------------------------------
c.JupyterHub.bind_url = "http://:8000"
c.JupyterHub.hub_bind_url = "http://0.0.0.0:8081"
c.JupyterHub.cleanup_servers = True

2
stacks/mlflow/Dockerfile Normal file
View File

@@ -0,0 +1,2 @@
FROM ghcr.io/mlflow/mlflow:v2.18.0
RUN pip install --no-cache-dir psycopg2-binary

View File

@@ -3,7 +3,10 @@
services:
mlflow:
image: ghcr.io/mlflow/mlflow:v2.18.0
build:
context: . # custom image: upstream + psycopg2-binary for postgres backend
dockerfile: Dockerfile
image: cloud-mlflow:v2.18.0
restart: unless-stopped
networks: [app, data]
command: >

View File

@@ -1 +0,0 @@
# mosquitto — broker uses config file, no env vars in stub

View File

@@ -1,11 +0,0 @@
# mosquitto
Eclipse Mosquitto MQTT broker. **Reserved for the FROST (SensorThings API) stack** — separate from the general-purpose `rabbitmq` broker. **Cloud-only.**
- **Network**: `app` (internal only — FROST services connect via service name `mosquitto`)
- **No external ingress** by default. If FROST needs external MQTT publishers, route them through a separate nginx stream block on a different port (not 8883 — that belongs to rabbitmq).
- **Config**: `config/mosquitto.conf` — listener config, ACLs, persistence
- **TODO**:
- ACL aligned with FROST topic structure
- Persistence retention policy
- Optional shovel from `rabbitmq` if cross-broker forwarding is needed

View File

@@ -1,22 +0,0 @@
# mosquitto — MQTT broker reserved for the FROST (SensorThings) stack
# Cloud-only. Internal to its own stack; no external ingress by default.
# Networks: app
services:
mosquitto:
image: eclipse-mosquitto:2.0
restart: unless-stopped
networks: [app]
volumes:
- ./config/mosquitto.conf:/mosquitto/config/mosquitto.conf:ro
- mosquitto-data:/mosquitto/data
- mosquitto-log:/mosquitto/log
# No 'ports:' — FROST is the only intended consumer. If external MQTT
# access for FROST is needed later, add a separate nginx stream block.
networks:
app:
volumes:
mosquitto-data:
mosquitto-log:

View File

@@ -4,6 +4,27 @@
# Publishes: 80, 443, 8883 on the host
services:
# One-shot init: generate a self-signed dummy cert if /etc/letsencrypt/live/infra/
# doesn't already have one. Lets nginx start on a fresh deploy before certbot has
# issued the real cert via HTTP-01. Subsequent runs are no-ops.
nginx-init:
image: alpine/openssl:latest
restart: "no"
volumes:
- nginx-certs:/etc/letsencrypt
entrypoint: ["/bin/sh", "-c"]
command:
- |
set -eu
d=/etc/letsencrypt/live/infra
if [ ! -s "$$d/fullchain.pem" ] || [ ! -s "$$d/privkey.pem" ]; then
echo "nginx-init: generating self-signed bootstrap cert at $$d"
mkdir -p "$$d"
openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout "$$d/privkey.pem" -out "$$d/fullchain.pem" -subj '/CN=bootstrap-infra'
else
echo "nginx-init: cert already present at $$d, skipping"
fi
nginx:
image: nginx:1.27-alpine
restart: unless-stopped
@@ -19,7 +40,10 @@ services:
- nginx-certs:/etc/letsencrypt:ro
- nginx-acme-challenge:/var/www/certbot:ro
depends_on:
- certbot
nginx-init:
condition: service_completed_successfully
certbot:
condition: service_started
certbot:
image: certbot/certbot:latest

View File

@@ -0,0 +1,37 @@
server {
listen 443 ssl;
http2 on;
server_name frost.wbd-rd.nl;
ssl_certificate /etc/letsencrypt/live/infra/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/infra/privkey.pem;
# FROST REST + admin UI
location /FROST-Server {
proxy_pass http://frost-http:8080;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_read_timeout 60s;
}
# SensorThings MQTT-over-WebSocket
location /mqtt {
proxy_pass http://frost-mqtt:9876;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_read_timeout 3600s; # long-lived MQTT WS sessions
proxy_send_timeout 3600s;
}
# Root redirects to the REST UI for convenience
location = / {
return 302 /FROST-Server/;
}
}

View File

@@ -4,7 +4,7 @@
services:
node-red:
image: nodered/node-red:4
image: nodered/node-red:4.1
restart: unless-stopped
networks: [app]
volumes:

View File

@@ -2,26 +2,15 @@
Docker container management UI — the "operator console" for cloud and edge.
## Standalone first-run (cloud)
## Access
Bring this up **first**, before nginx-proxy, so you have a GUI from day 1 to inspect containers, logs, networks, and volumes as the rest of the cloud stack comes online.
Portainer ingresses through nginx-proxy: `https://portainer.wbd-rd.nl/`. No host port is published by default.
```bash
cd stacks/portainer
docker compose up -d
```
For emergency ops (nginx down, etc.), uncomment the `ports:` block in `compose.yml` and `docker compose up -d portainer` to expose `:9443` and `:8000` directly.
Browse `https://<cloud-host>:9443` (self-signed cert — accept once). Create the admin user on first visit.
## First-run admin
## After nginx-proxy is up
Once nginx-proxy + the wildcard cert are working:
1. Comment the `ports:` block in `compose.yml`.
2. `docker compose down && docker compose up -d` (or recreate via cloud/compose.yml include).
3. Browse `https://portainer.wbd-rd.nl/` (real cert, behind nginx).
The direct `:9443` access is intentionally retained as commented-out config for emergency ops if nginx goes down.
On first visit, Portainer prompts for an admin username and password. Use a long random password; this account is break-glass — your daily login should come via Keycloak OIDC once that gate is wired (see TODO).
## Edge-agent topology

View File

@@ -1,24 +1,25 @@
# portainer — container management UI (operator console)
# Networks: mgmt
# Networks: mgmt (docker socket plane) + app (nginx-proxy reaches HTTPS upstream)
# Ingress: nginx-proxy → portainer:9443 (self-signed upstream cert) → portainer.wbd-rd.nl
#
# Standalone deploy publishes 9443 directly so you have a GUI from day 1,
# before nginx-proxy + TLS are wired up. Once nginx-proxy is up, comment
# the `ports:` block and access via https://portainer.wbd-rd.nl/.
# Direct :9443 host access is intentionally NOT published anymore — re-enable
# only for emergency ops by uncommenting the `ports:` block below.
services:
portainer:
image: portainer/portainer-ce:2.21.4
restart: unless-stopped
networks: [mgmt]
ports:
- "9443:9443" # HTTPS UI, self-signed cert (early-stage direct access)
- "8000:8000" # Edge-agent reverse tunnel (for edge sites)
networks: [mgmt, app]
# ports:
# - "9443:9443" # HTTPS UI direct access (emergency ops only)
# - "8000:8000" # Edge-agent reverse tunnel (open when wiring edges)
volumes:
- portainer-data:/data
- /var/run/docker.sock:/var/run/docker.sock:ro
networks:
mgmt:
app:
volumes:
portainer-data:

View File

@@ -0,0 +1 @@
[rabbitmq_management,rabbitmq_mqtt].

View File

@@ -0,0 +1,19 @@
# RabbitMQ — minimal config for the cloud hub.
# Listeners:
# amqp on 5672 (internal app traffic)
# mgmt UI on 15672 (proxied by nginx as mq.wbd-rd.nl)
# mqtt on 1883 (cloud-external traffic comes in via nginx stream proxy on tcp/8883 → here)
#
# Authentication: default RABBITMQ_DEFAULT_USER/PASS from env (set in cloud/.env).
# Anonymous MQTT is disabled so the broker rejects unauthenticated clients.
listeners.tcp.default = 5672
management.tcp.port = 15672
mqtt.listeners.tcp.default = 1883
mqtt.allow_anonymous = false
mqtt.vhost = /
mqtt.exchange = amq.topic
# Don't refuse connections while the broker is still starting (helps first-boot ordering)
cluster_partition_handling = ignore

View File

@@ -18,6 +18,12 @@ services:
volumes:
- sql-data:/var/lib/postgresql/data
- ./config/init.d:/docker-entrypoint-initdb.d:ro
healthcheck:
test: ["CMD-SHELL", "pg_isready -U ${SQL_USER} -d ${SQL_DB}"]
interval: 10s
timeout: 5s
retries: 10
start_period: 30s
networks:
data: