Files
infra/docs/architecture.md
znetsixe f69453df99 refactor(dns): rename frost.wbd-rd.nl → sta.wbd-rd.nl; drop redundant portainer.wbd-rd.nl
Match the short-functional naming convention used by the other vhosts
(git, auth, dash, flow, ml, hub, ops, mq, ci, mqtt). FROST implements
OGC SensorThings API, so `sta` is the natural fit.

portainer.wbd-rd.nl is dropped from deploy.sh HOSTS — there is no
nginx vhost for it; portainer is already served via ops.wbd-rd.nl.

DNS prereq for first deploy is now: create one new A record for
sta.wbd-rd.nl → cloud public IP. All other short subdomains already
point correctly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 16:46:32 +02:00

8.5 KiB

Architecture

R&D infrastructure for Waterschap Brabantse Delta. Hub-and-spoke topology:

  • Cloud layer — central services, one deployment, internet-facing.
  • Edge layer — per-plant, plant-LAN-facing, tunneled to cloud via WireGuard.
  • OT layer — per-plant, behind edge. Managed outside this repo.
                Internet
                    │
        ┌───────────┴───────────┐
        │ tcp/80, 443, 8883     │
        │ udp/51820             │
        ▼                       │
┌────────────────────────────────────┐
│ Cloud (central, one)               │
│   nginx + certbot ◀── 80/443/8883  │
│   wireguard-server ◀── 51820/udp   │
│   gitea, jenkins, keycloak, ...    │
│   influxdb, grafana, node-red      │
│   rabbitmq, postfix, portainer     │
│   sql (postgres, single config)    │
│   mlflow, jupyterhub               │
│   frost (SensorThings API)         │
└───────────────┬────────────────────┘
                │ WireGuard tunnels
        ┌───────┼────────┬───────────┐
        ▼       ▼        ▼           ▼
    ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐
    │ Edge: │ │ Edge: │ │ Edge: │ │  ...  │
    │gemaal1│ │ ...   │ │ ...   │ │       │
    └───┬───┘ └───────┘ └───────┘ └───────┘
        │ TLS
        ▼
    ┌──────────┐
    │ OT       │  ← out of scope of this repo
    │ OPCUA    │
    │ PLC      │
    └──────────┘

Network topology (per layer)

Each layer uses four internal Docker networks:

Network Purpose Notes
edge Outermost. Cloud: internet-facing. Edge: plant-LAN-facing. Only port-publishers join.
app Application / automation tier. Default landing for app services.
data Databases (influxdb, sql). internal: true — no internet egress.
mgmt Identity, control plane (portainer, keycloak admin, wireguard mgmt). Restricted.

Cloud attachments

edge   : nginx, wireguard-server
app    : nginx, rabbitmq, postfix, node-red, grafana,
         jenkins, gitea, keycloak, mlflow, jupyterhub,
         portainer, frost-http, frost-mqtt
data   : influxdb, sql, grafana, mlflow
mgmt   : portainer, keycloak, wireguard-server, jupyterhub
frost-internal (private to frost stack) :
         frost-db (postgis), frost-mosquitto, frost-http, frost-mqtt

Edge attachments

edge   : nginx                                  ← plant-LAN-facing
app    : nginx, rabbitmq, postfix, node-red,
         grafana, keycloak, wireguard-client
data   : influxdb, grafana
mgmt   : portainer, keycloak, wireguard-client

Ingress (the only ports facing outside)

Cloud (the central host)

Port Container Notes
tcp/80 nginx HTTP → 301 to 443; also serves /.well-known/acme-challenge/ for certbot
tcp/443 nginx All HTTPS UIs; TLS termination
tcp/8883 nginx MQTT-TLS via stream {} block; SNI route to rabbitmq:1883
udp/51820 wireguard-server VPN tunnel ingress

Two containers publish a total of four ports. Everything else is invisible from outside the host.

Edge (per-plant gateway)

Port Container Bound to
tcp/80 nginx Plant-LAN interface only
tcp/443 nginx Plant-LAN interface only

The edge wireguard-client initiates outbound to the cloud — it publishes no port.

TLS strategy

Stock nginx + certbot sidecar (Let's Encrypt, HTTP-01 webroot).

  • Stock nginx:1.27-alpine — required because we use the stream {} context for MQTT-TLS. nginxproxy/nginx-proxy (the jwilder image) is HTTP/HTTPS-only and can't expose stream cleanly.
  • certbot/certbot sidecar runs certbot renew every 12h. Shared nginx-certs + nginx-acme-challenge volumes coordinate cert + challenge state between the two containers.
  • Initial issuance is manual (one-time docker compose run --rm certbot certonly --webroot …). Renewal is automatic.

For cloud-internal hostnames not reachable via Let's Encrypt HTTP-01, the longer-term plan is a small internal PKI (step-ca or similar) backed by sql. Out of scope for first deploy.

Why segment

  • Blast radius: a compromised node-red on app cannot reach influxdb on data unless an explicit attachment is declared. Each service's reachability is auditable from networks: alone.
  • Defense in depth: only nginx and wireguard-server bind host ports. No accidental 0.0.0.0 exposures.
  • NIS2 / utility audit: WBD is in scope as water-sector. Compose networks evidence segmentation at runtime and on paper.

Special cases

Postfix (cloud + edge)

Postfix is outbound-only. It initiates SMTP to internet MX servers but accepts no inbound. Zero ingress, no published port, no listener facing internet. Just needs egress (every container has it via host NAT).

MQTT — RabbitMQ for public traffic, dedicated mosquitto inside FROST

  • RabbitMQ is the only public MQTT broker. SCADA / IoT / edge clients connect to mqtt.wbd-rd.nl:8883 (TLS, via nginx stream {} block proxying to rabbitmq:1883). Authentication uses the standard RABBITMQ_USER/PASS.
  • frost-mosquitto lives inside the frost stack on the private frost-internal docker network — it is purely the message bus between frost-http and frost-mqtt. It is not reachable from anywhere outside the frost stack.
  • SensorThings-protocol MQTT (the FROST native MQTT API) is exposed to clients via frost-mqtt's WebSocket port, proxied as https://sta.wbd-rd.nl/mqtt.

If FROST consumers also need to see SCADA traffic on RabbitMQ, add a RabbitMQ shovel plugin pointing into the frost stack. Not wired up by default.

Gitea — HTTPS only

No SSH ingress. GITEA__server__DISABLE_SSH=true. All clones over HTTPS via nginx-proxy. Re-evaluate only if Gitea Actions runners require SSH push.

WireGuard server (cloud)

WireGuard is connectionless UDP with crypto-routed packets. Proxying through nginx-stream breaks NAT/MTU and adds no security benefit. The server publishes udp/51820 directly — the only non-nginx public ingress on cloud.

Stacks

The repo defines 15 stacks under stacks/:

  • Cloud + edge: nginx-proxy, node-red, influxdb, grafana, keycloak, portainer, rabbitmq, postfix
  • Cloud-only: wireguard-server, gitea (HTTPS), jenkins, sql (postgres), mlflow, jupyterhub, frost (SensorThings, dedicated postgis + internal bus)
  • Edge-only: wireguard-client

Sites

Site Status
gemaal1 Scaffolded; awaiting hardware provisioning (PLANT_LAN_IP, WG peer key, OPCUA endpoint)

Additional plants follow the same pattern under sites/<plant>/.

Conventions

  • Folder names: kebab-case (node-red, nginx-proxy, wireguard-server).
  • Compose filename: compose.yml (official Compose Spec).
  • Composition: cloud/site composes pull stacks via include:. Stacks remain runnable standalone for testing.
  • Secrets: .env (gitignored) + .env.example (committed with placeholders).
  • Per-stack contents: compose.yml, .env.example, README.md, optional config/.
  • OT layer: out of scope; PLC + OPCUA managed in a separate process.

Open decisions

Tracked here so we don't forget. Each lands when we harden the relevant stack.

  • MinIO / artifact store — MLflow uses local volume for now; switch to S3-compatible MinIO sidecar when artifacts grow.
  • JupyterHub auth — target Keycloak OIDC via oauthenticator.generic.GenericOAuthenticator.
  • WG client routing — split-tunnel vs full; per-peer AllowedIPs policy.
  • FROST auth — currently BasicAuthProvider against the USERS table in frost-db; swap to Keycloak OIDC via the FROST OIDC plugin when SSO is rolled out.
  • MQTT cross-broker shovel — only if FROST consumers must see RabbitMQ traffic or vice versa.
  • Internal PKI — for cloud-internal hostnames not eligible for Let's Encrypt HTTP-01.
  • Backup strategy — for sql (postgres), influxdb, gitea-data, jenkins-home, mlflow-artifacts.
  • Provision Gemaal1 — fill in PLANT_LAN_IP, WG peer key, OPCUA endpoint, deploy first stacks.