Files
infra/docs/architecture.md
znetsixe 2f5e3b4183 feat: SQL=postgres, nginx+certbot, MQTT split, ML stacks, gitea HTTPS-only, gemaal1 site
Round-2 changes locking in scaffold-phase decisions and adding ML/notebook stacks.

Locked decisions
- sql: postgres 16-alpine (was TBD); init.d/ mount for per-app DB provisioning
- nginx-proxy: stock nginx + certbot sidecar (was nginx:alpine TODO).
  Chose stock over nginxproxy/nginx-proxy because stream{} is required for
  MQTT-TLS reverse-proxy on tcp/8883 to rabbitmq:1883.
- gitea: HTTPS-only (DISABLE_SSH=true). No SSH port published.

MQTT split
- Remove stacks/mqtt placeholder.
- Add stacks/rabbitmq — general-purpose broker (AMQP + MQTT plugin),
  used at both cloud and edge. External MQTT clients reach cloud broker
  via nginx stream-proxy on 8883.
- Add stacks/mosquitto — reserved for the FROST (SensorThings) stack
  only. Cloud-only. Internal to its own stack; no external ingress.

ML / notebooks (cloud-only)
- stacks/mlflow — experiment tracking + model registry. Postgres backend
  on sql stack; local volume for artifacts (S3/MinIO is a TODO).
- stacks/jupyterhub — multi-user notebook server. DockerSpawner via
  mounted docker.sock; users spawn into cloud-app network so they can
  reach mlflow, influxdb (via grafana), rabbitmq.

Sites
- sites/gemaal1 — first edge deployment scaffold. Site-local override
  template for binding nginx to PLANT_LAN_IP.

Docs
- README + docs/architecture.md updated: stacks table now lists 15 stacks,
  ingress + attachment tables reflect mlflow/jupyterhub, TLS strategy
  section locked, MQTT-split section added, Gitea HTTPS-only noted.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 13:22:46 +02:00

8.0 KiB

Architecture

R&D infrastructure for Waterschap Brabantse Delta. Hub-and-spoke topology:

  • Cloud layer — central services, one deployment, internet-facing.
  • Edge layer — per-plant, plant-LAN-facing, tunneled to cloud via WireGuard.
  • OT layer — per-plant, behind edge. Managed outside this repo.
                Internet
                    │
        ┌───────────┴───────────┐
        │ tcp/80, 443, 8883     │
        │ udp/51820             │
        ▼                       │
┌────────────────────────────────────┐
│ Cloud (central, one)               │
│   nginx + certbot ◀── 80/443/8883  │
│   wireguard-server ◀── 51820/udp   │
│   gitea, jenkins, keycloak, ...    │
│   influxdb, grafana, node-red      │
│   rabbitmq, postfix, portainer     │
│   sql (postgres, single config)    │
│   mlflow, jupyterhub               │
│   mosquitto (FROST stack only)     │
└───────────────┬────────────────────┘
                │ WireGuard tunnels
        ┌───────┼────────┬───────────┐
        ▼       ▼        ▼           ▼
    ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐
    │ Edge: │ │ Edge: │ │ Edge: │ │  ...  │
    │gemaal1│ │ ...   │ │ ...   │ │       │
    └───┬───┘ └───────┘ └───────┘ └───────┘
        │ TLS
        ▼
    ┌──────────┐
    │ OT       │  ← out of scope of this repo
    │ OPCUA    │
    │ PLC      │
    └──────────┘

Network topology (per layer)

Each layer uses four internal Docker networks:

Network Purpose Notes
edge Outermost. Cloud: internet-facing. Edge: plant-LAN-facing. Only port-publishers join.
app Application / automation tier. Default landing for app services.
data Databases (influxdb, sql). internal: true — no internet egress.
mgmt Identity, control plane (portainer, keycloak admin, wireguard mgmt). Restricted.

Cloud attachments

edge   : nginx, wireguard-server
app    : nginx, rabbitmq, postfix, node-red, grafana,
         jenkins, gitea, keycloak, mlflow, jupyterhub
data   : influxdb, sql, grafana, mlflow
mgmt   : portainer, keycloak, wireguard-server, jupyterhub

(mosquitto joins app only when the FROST stack is deployed.)

Edge attachments

edge   : nginx                                  ← plant-LAN-facing
app    : nginx, rabbitmq, postfix, node-red,
         grafana, keycloak, wireguard-client
data   : influxdb, grafana
mgmt   : portainer, keycloak, wireguard-client

Ingress (the only ports facing outside)

Cloud (the central host)

Port Container Notes
tcp/80 nginx HTTP → 301 to 443; also serves /.well-known/acme-challenge/ for certbot
tcp/443 nginx All HTTPS UIs; TLS termination
tcp/8883 nginx MQTT-TLS via stream {} block; SNI route to rabbitmq:1883
udp/51820 wireguard-server VPN tunnel ingress

Two containers publish a total of four ports. Everything else is invisible from outside the host.

Edge (per-plant gateway)

Port Container Bound to
tcp/80 nginx Plant-LAN interface only
tcp/443 nginx Plant-LAN interface only

The edge wireguard-client initiates outbound to the cloud — it publishes no port.

TLS strategy

Stock nginx + certbot sidecar (Let's Encrypt, HTTP-01 webroot).

  • Stock nginx:1.27-alpine — required because we use the stream {} context for MQTT-TLS. nginxproxy/nginx-proxy (the jwilder image) is HTTP/HTTPS-only and can't expose stream cleanly.
  • certbot/certbot sidecar runs certbot renew every 12h. Shared nginx-certs + nginx-acme-challenge volumes coordinate cert + challenge state between the two containers.
  • Initial issuance is manual (one-time docker compose run --rm certbot certonly --webroot …). Renewal is automatic.

For cloud-internal hostnames not reachable via Let's Encrypt HTTP-01, the longer-term plan is a small internal PKI (step-ca or similar) backed by sql. Out of scope for first deploy.

Why segment

  • Blast radius: a compromised node-red on app cannot reach influxdb on data unless an explicit attachment is declared. Each service's reachability is auditable from networks: alone.
  • Defense in depth: only nginx and wireguard-server bind host ports. No accidental 0.0.0.0 exposures.
  • NIS2 / utility audit: WBD is in scope as water-sector. Compose networks evidence segmentation at runtime and on paper.

Special cases

Postfix (cloud + edge)

Postfix is outbound-only. It initiates SMTP to internet MX servers but accepts no inbound. Zero ingress, no published port, no listener facing internet. Just needs egress (every container has it via host NAT).

MQTT — two brokers

  • RabbitMQ is the general-purpose broker. Runs at both cloud and edge. MQTT plugin enabled. Cloud-side reachable externally via nginx stream proxy on tcp/8883. Edge-side fully internal.
  • Mosquitto is reserved for the FROST (SensorThings API) stack only — cloud-only. Internal to its own stack — no external ingress unless FROST publishers need to push from outside (in which case use a separate stream block on a different port).

If FROST needs cross-broker forwarding, add a RabbitMQ shovel plugin pointing at mosquitto. Not wired up by default.

Gitea — HTTPS only

No SSH ingress. GITEA__server__DISABLE_SSH=true. All clones over HTTPS via nginx-proxy. Re-evaluate only if Gitea Actions runners require SSH push.

WireGuard server (cloud)

WireGuard is connectionless UDP with crypto-routed packets. Proxying through nginx-stream breaks NAT/MTU and adds no security benefit. The server publishes udp/51820 directly — the only non-nginx public ingress on cloud.

Stacks

The repo defines 15 stacks under stacks/:

  • Cloud + edge: nginx-proxy, node-red, influxdb, grafana, keycloak, portainer, rabbitmq, postfix
  • Cloud-only: wireguard-server, gitea (HTTPS), jenkins, sql (postgres), mlflow, jupyterhub, mosquitto (FROST)
  • Edge-only: wireguard-client

Sites

Site Status
gemaal1 Scaffolded; awaiting hardware provisioning (PLANT_LAN_IP, WG peer key, OPCUA endpoint)

Additional plants follow the same pattern under sites/<plant>/.

Conventions

  • Folder names: kebab-case (node-red, nginx-proxy, wireguard-server).
  • Compose filename: compose.yml (official Compose Spec).
  • Composition: cloud/site composes pull stacks via include:. Stacks remain runnable standalone for testing.
  • Secrets: .env (gitignored) + .env.example (committed with placeholders).
  • Per-stack contents: compose.yml, .env.example, README.md, optional config/.
  • OT layer: out of scope; PLC + OPCUA managed in a separate process.

Open decisions

Tracked here so we don't forget. Each lands when we harden the relevant stack.

  • MinIO / artifact store — MLflow uses local volume for now; switch to S3-compatible MinIO sidecar when artifacts grow.
  • JupyterHub auth — target Keycloak OIDC via oauthenticator.generic.GenericOAuthenticator.
  • WG client routing — split-tunnel vs full; per-peer AllowedIPs policy.
  • MQTT cross-broker shovel — only if FROST consumers must see RabbitMQ traffic or vice versa.
  • Internal PKI — for cloud-internal hostnames not eligible for Let's Encrypt HTTP-01.
  • Backup strategy — for sql (postgres), influxdb, gitea-data, jenkins-home, mlflow-artifacts.
  • Provision Gemaal1 — fill in PLANT_LAN_IP, WG peer key, OPCUA endpoint, deploy first stacks.