Files
infra/docs/architecture.md
znetsixe 4117ec6063 feat(cloud): single-shot deploy.sh + FROST stack + healthchecks
Stage 5 — make the cloud composition spin up in one command and add
the SensorThings (FROST) stack as a fully segregated tenant.

cloud/deploy.sh — idempotent, 7-step bring-up:
  preflight → validate → up + wait → cert state → issue/renew →
  service status → endpoint smoke test. Reissues LE cert only when
  current issuer no longer matches ACME_CA_URI. Move-aside-then-
  restore-on-failure so the bootstrap cert survives a failed certbot.

stacks/frost — new stack, segregated from shared sql/rabbitmq:
  - dedicated postgis container (frost-db)
  - dedicated internal mosquitto bus (frost-mosquitto)
  - frost-http + frost-mqtt on a private frost-internal network,
    joined to cloud-app only for nginx ingress at frost.wbd-rd.nl
  - shared mosquitto stack deleted; rabbitmq remains the only public
    MQTT broker (mqtt.wbd-rd.nl:8883 via stream proxy)

stacks/sql — pg_isready healthcheck so keycloak/gitea/mlflow can gate
on service_healthy via cloud-level depends_on overrides.

stacks/nginx-proxy:
  - nginx-init service generates a self-signed bootstrap cert on
    fresh deploy so nginx starts before certbot has issued a real one
  - frost.wbd-rd.nl vhost (/FROST-Server → frost-http:8080,
    /mqtt → frost-mqtt:9876 WebSocket)

stacks/mlflow — custom Dockerfile (upstream + psycopg2-binary) so the
official image can speak to the shared sql backend.

stacks/jupyterhub — DummyAuthenticator stub gated by
JUPYTERHUB_ADMIN_PASSWORD; TODO comments point at OIDC + DockerSpawner.

stacks/rabbitmq — config/{enabled_plugins,rabbitmq.conf} stubs
(management + mqtt plugins, MQTT auth required).

stacks/portainer — ports unpublished; nginx now the only ingress.

stacks/node-red — pin to 4.1 (the floating "4" tag does not exist).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 16:37:58 +02:00

8.5 KiB

Architecture

R&D infrastructure for Waterschap Brabantse Delta. Hub-and-spoke topology:

  • Cloud layer — central services, one deployment, internet-facing.
  • Edge layer — per-plant, plant-LAN-facing, tunneled to cloud via WireGuard.
  • OT layer — per-plant, behind edge. Managed outside this repo.
                Internet
                    │
        ┌───────────┴───────────┐
        │ tcp/80, 443, 8883     │
        │ udp/51820             │
        ▼                       │
┌────────────────────────────────────┐
│ Cloud (central, one)               │
│   nginx + certbot ◀── 80/443/8883  │
│   wireguard-server ◀── 51820/udp   │
│   gitea, jenkins, keycloak, ...    │
│   influxdb, grafana, node-red      │
│   rabbitmq, postfix, portainer     │
│   sql (postgres, single config)    │
│   mlflow, jupyterhub               │
│   frost (SensorThings API)         │
└───────────────┬────────────────────┘
                │ WireGuard tunnels
        ┌───────┼────────┬───────────┐
        ▼       ▼        ▼           ▼
    ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐
    │ Edge: │ │ Edge: │ │ Edge: │ │  ...  │
    │gemaal1│ │ ...   │ │ ...   │ │       │
    └───┬───┘ └───────┘ └───────┘ └───────┘
        │ TLS
        ▼
    ┌──────────┐
    │ OT       │  ← out of scope of this repo
    │ OPCUA    │
    │ PLC      │
    └──────────┘

Network topology (per layer)

Each layer uses four internal Docker networks:

Network Purpose Notes
edge Outermost. Cloud: internet-facing. Edge: plant-LAN-facing. Only port-publishers join.
app Application / automation tier. Default landing for app services.
data Databases (influxdb, sql). internal: true — no internet egress.
mgmt Identity, control plane (portainer, keycloak admin, wireguard mgmt). Restricted.

Cloud attachments

edge   : nginx, wireguard-server
app    : nginx, rabbitmq, postfix, node-red, grafana,
         jenkins, gitea, keycloak, mlflow, jupyterhub,
         portainer, frost-http, frost-mqtt
data   : influxdb, sql, grafana, mlflow
mgmt   : portainer, keycloak, wireguard-server, jupyterhub
frost-internal (private to frost stack) :
         frost-db (postgis), frost-mosquitto, frost-http, frost-mqtt

Edge attachments

edge   : nginx                                  ← plant-LAN-facing
app    : nginx, rabbitmq, postfix, node-red,
         grafana, keycloak, wireguard-client
data   : influxdb, grafana
mgmt   : portainer, keycloak, wireguard-client

Ingress (the only ports facing outside)

Cloud (the central host)

Port Container Notes
tcp/80 nginx HTTP → 301 to 443; also serves /.well-known/acme-challenge/ for certbot
tcp/443 nginx All HTTPS UIs; TLS termination
tcp/8883 nginx MQTT-TLS via stream {} block; SNI route to rabbitmq:1883
udp/51820 wireguard-server VPN tunnel ingress

Two containers publish a total of four ports. Everything else is invisible from outside the host.

Edge (per-plant gateway)

Port Container Bound to
tcp/80 nginx Plant-LAN interface only
tcp/443 nginx Plant-LAN interface only

The edge wireguard-client initiates outbound to the cloud — it publishes no port.

TLS strategy

Stock nginx + certbot sidecar (Let's Encrypt, HTTP-01 webroot).

  • Stock nginx:1.27-alpine — required because we use the stream {} context for MQTT-TLS. nginxproxy/nginx-proxy (the jwilder image) is HTTP/HTTPS-only and can't expose stream cleanly.
  • certbot/certbot sidecar runs certbot renew every 12h. Shared nginx-certs + nginx-acme-challenge volumes coordinate cert + challenge state between the two containers.
  • Initial issuance is manual (one-time docker compose run --rm certbot certonly --webroot …). Renewal is automatic.

For cloud-internal hostnames not reachable via Let's Encrypt HTTP-01, the longer-term plan is a small internal PKI (step-ca or similar) backed by sql. Out of scope for first deploy.

Why segment

  • Blast radius: a compromised node-red on app cannot reach influxdb on data unless an explicit attachment is declared. Each service's reachability is auditable from networks: alone.
  • Defense in depth: only nginx and wireguard-server bind host ports. No accidental 0.0.0.0 exposures.
  • NIS2 / utility audit: WBD is in scope as water-sector. Compose networks evidence segmentation at runtime and on paper.

Special cases

Postfix (cloud + edge)

Postfix is outbound-only. It initiates SMTP to internet MX servers but accepts no inbound. Zero ingress, no published port, no listener facing internet. Just needs egress (every container has it via host NAT).

MQTT — RabbitMQ for public traffic, dedicated mosquitto inside FROST

  • RabbitMQ is the only public MQTT broker. SCADA / IoT / edge clients connect to mqtt.wbd-rd.nl:8883 (TLS, via nginx stream {} block proxying to rabbitmq:1883). Authentication uses the standard RABBITMQ_USER/PASS.
  • frost-mosquitto lives inside the frost stack on the private frost-internal docker network — it is purely the message bus between frost-http and frost-mqtt. It is not reachable from anywhere outside the frost stack.
  • SensorThings-protocol MQTT (the FROST native MQTT API) is exposed to clients via frost-mqtt's WebSocket port, proxied as https://frost.wbd-rd.nl/mqtt.

If FROST consumers also need to see SCADA traffic on RabbitMQ, add a RabbitMQ shovel plugin pointing into the frost stack. Not wired up by default.

Gitea — HTTPS only

No SSH ingress. GITEA__server__DISABLE_SSH=true. All clones over HTTPS via nginx-proxy. Re-evaluate only if Gitea Actions runners require SSH push.

WireGuard server (cloud)

WireGuard is connectionless UDP with crypto-routed packets. Proxying through nginx-stream breaks NAT/MTU and adds no security benefit. The server publishes udp/51820 directly — the only non-nginx public ingress on cloud.

Stacks

The repo defines 15 stacks under stacks/:

  • Cloud + edge: nginx-proxy, node-red, influxdb, grafana, keycloak, portainer, rabbitmq, postfix
  • Cloud-only: wireguard-server, gitea (HTTPS), jenkins, sql (postgres), mlflow, jupyterhub, frost (SensorThings, dedicated postgis + internal bus)
  • Edge-only: wireguard-client

Sites

Site Status
gemaal1 Scaffolded; awaiting hardware provisioning (PLANT_LAN_IP, WG peer key, OPCUA endpoint)

Additional plants follow the same pattern under sites/<plant>/.

Conventions

  • Folder names: kebab-case (node-red, nginx-proxy, wireguard-server).
  • Compose filename: compose.yml (official Compose Spec).
  • Composition: cloud/site composes pull stacks via include:. Stacks remain runnable standalone for testing.
  • Secrets: .env (gitignored) + .env.example (committed with placeholders).
  • Per-stack contents: compose.yml, .env.example, README.md, optional config/.
  • OT layer: out of scope; PLC + OPCUA managed in a separate process.

Open decisions

Tracked here so we don't forget. Each lands when we harden the relevant stack.

  • MinIO / artifact store — MLflow uses local volume for now; switch to S3-compatible MinIO sidecar when artifacts grow.
  • JupyterHub auth — target Keycloak OIDC via oauthenticator.generic.GenericOAuthenticator.
  • WG client routing — split-tunnel vs full; per-peer AllowedIPs policy.
  • FROST auth — currently BasicAuthProvider against the USERS table in frost-db; swap to Keycloak OIDC via the FROST OIDC plugin when SSO is rolled out.
  • MQTT cross-broker shovel — only if FROST consumers must see RabbitMQ traffic or vice versa.
  • Internal PKI — for cloud-internal hostnames not eligible for Let's Encrypt HTTP-01.
  • Backup strategy — for sql (postgres), influxdb, gitea-data, jenkins-home, mlflow-artifacts.
  • Provision Gemaal1 — fill in PLANT_LAN_IP, WG peer key, OPCUA endpoint, deploy first stacks.