Round-2 changes locking in scaffold-phase decisions and adding ML/notebook stacks.
Locked decisions
- sql: postgres 16-alpine (was TBD); init.d/ mount for per-app DB provisioning
- nginx-proxy: stock nginx + certbot sidecar (was nginx:alpine TODO).
Chose stock over nginxproxy/nginx-proxy because stream{} is required for
MQTT-TLS reverse-proxy on tcp/8883 to rabbitmq:1883.
- gitea: HTTPS-only (DISABLE_SSH=true). No SSH port published.
MQTT split
- Remove stacks/mqtt placeholder.
- Add stacks/rabbitmq — general-purpose broker (AMQP + MQTT plugin),
used at both cloud and edge. External MQTT clients reach cloud broker
via nginx stream-proxy on 8883.
- Add stacks/mosquitto — reserved for the FROST (SensorThings) stack
only. Cloud-only. Internal to its own stack; no external ingress.
ML / notebooks (cloud-only)
- stacks/mlflow — experiment tracking + model registry. Postgres backend
on sql stack; local volume for artifacts (S3/MinIO is a TODO).
- stacks/jupyterhub — multi-user notebook server. DockerSpawner via
mounted docker.sock; users spawn into cloud-app network so they can
reach mlflow, influxdb (via grafana), rabbitmq.
Sites
- sites/gemaal1 — first edge deployment scaffold. Site-local override
template for binding nginx to PLANT_LAN_IP.
Docs
- README + docs/architecture.md updated: stacks table now lists 15 stacks,
ingress + attachment tables reflect mlflow/jupyterhub, TLS strategy
section locked, MQTT-split section added, Gitea HTTPS-only noted.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8.0 KiB
Architecture
R&D infrastructure for Waterschap Brabantse Delta. Hub-and-spoke topology:
- Cloud layer — central services, one deployment, internet-facing.
- Edge layer — per-plant, plant-LAN-facing, tunneled to cloud via WireGuard.
- OT layer — per-plant, behind edge. Managed outside this repo.
Internet
│
┌───────────┴───────────┐
│ tcp/80, 443, 8883 │
│ udp/51820 │
▼ │
┌────────────────────────────────────┐
│ Cloud (central, one) │
│ nginx + certbot ◀── 80/443/8883 │
│ wireguard-server ◀── 51820/udp │
│ gitea, jenkins, keycloak, ... │
│ influxdb, grafana, node-red │
│ rabbitmq, postfix, portainer │
│ sql (postgres, single config) │
│ mlflow, jupyterhub │
│ mosquitto (FROST stack only) │
└───────────────┬────────────────────┘
│ WireGuard tunnels
┌───────┼────────┬───────────┐
▼ ▼ ▼ ▼
┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐
│ Edge: │ │ Edge: │ │ Edge: │ │ ... │
│gemaal1│ │ ... │ │ ... │ │ │
└───┬───┘ └───────┘ └───────┘ └───────┘
│ TLS
▼
┌──────────┐
│ OT │ ← out of scope of this repo
│ OPCUA │
│ PLC │
└──────────┘
Network topology (per layer)
Each layer uses four internal Docker networks:
| Network | Purpose | Notes |
|---|---|---|
edge |
Outermost. Cloud: internet-facing. Edge: plant-LAN-facing. | Only port-publishers join. |
app |
Application / automation tier. | Default landing for app services. |
data |
Databases (influxdb, sql). | internal: true — no internet egress. |
mgmt |
Identity, control plane (portainer, keycloak admin, wireguard mgmt). | Restricted. |
Cloud attachments
edge : nginx, wireguard-server
app : nginx, rabbitmq, postfix, node-red, grafana,
jenkins, gitea, keycloak, mlflow, jupyterhub
data : influxdb, sql, grafana, mlflow
mgmt : portainer, keycloak, wireguard-server, jupyterhub
(mosquitto joins app only when the FROST stack is deployed.)
Edge attachments
edge : nginx ← plant-LAN-facing
app : nginx, rabbitmq, postfix, node-red,
grafana, keycloak, wireguard-client
data : influxdb, grafana
mgmt : portainer, keycloak, wireguard-client
Ingress (the only ports facing outside)
Cloud (the central host)
| Port | Container | Notes |
|---|---|---|
tcp/80 |
nginx | HTTP → 301 to 443; also serves /.well-known/acme-challenge/ for certbot |
tcp/443 |
nginx | All HTTPS UIs; TLS termination |
tcp/8883 |
nginx | MQTT-TLS via stream {} block; SNI route to rabbitmq:1883 |
udp/51820 |
wireguard-server | VPN tunnel ingress |
Two containers publish a total of four ports. Everything else is invisible from outside the host.
Edge (per-plant gateway)
| Port | Container | Bound to |
|---|---|---|
tcp/80 |
nginx | Plant-LAN interface only |
tcp/443 |
nginx | Plant-LAN interface only |
The edge wireguard-client initiates outbound to the cloud — it publishes no port.
TLS strategy
Stock nginx + certbot sidecar (Let's Encrypt, HTTP-01 webroot).
- Stock
nginx:1.27-alpine— required because we use thestream {}context for MQTT-TLS.nginxproxy/nginx-proxy(the jwilder image) is HTTP/HTTPS-only and can't expose stream cleanly. certbot/certbotsidecar runscertbot renewevery 12h. Sharednginx-certs+nginx-acme-challengevolumes coordinate cert + challenge state between the two containers.- Initial issuance is manual (one-time
docker compose run --rm certbot certonly --webroot …). Renewal is automatic.
For cloud-internal hostnames not reachable via Let's Encrypt HTTP-01, the longer-term plan is a small internal PKI (step-ca or similar) backed by sql. Out of scope for first deploy.
Why segment
- Blast radius: a compromised node-red on
appcannot reach influxdb ondataunless an explicit attachment is declared. Each service's reachability is auditable fromnetworks:alone. - Defense in depth: only nginx and wireguard-server bind host ports. No accidental
0.0.0.0exposures. - NIS2 / utility audit: WBD is in scope as water-sector. Compose networks evidence segmentation at runtime and on paper.
Special cases
Postfix (cloud + edge)
Postfix is outbound-only. It initiates SMTP to internet MX servers but accepts no inbound. Zero ingress, no published port, no listener facing internet. Just needs egress (every container has it via host NAT).
MQTT — two brokers
- RabbitMQ is the general-purpose broker. Runs at both cloud and edge. MQTT plugin enabled. Cloud-side reachable externally via nginx stream proxy on
tcp/8883. Edge-side fully internal. - Mosquitto is reserved for the FROST (SensorThings API) stack only — cloud-only. Internal to its own stack — no external ingress unless FROST publishers need to push from outside (in which case use a separate stream block on a different port).
If FROST needs cross-broker forwarding, add a RabbitMQ shovel plugin pointing at mosquitto. Not wired up by default.
Gitea — HTTPS only
No SSH ingress. GITEA__server__DISABLE_SSH=true. All clones over HTTPS via nginx-proxy. Re-evaluate only if Gitea Actions runners require SSH push.
WireGuard server (cloud)
WireGuard is connectionless UDP with crypto-routed packets. Proxying through nginx-stream breaks NAT/MTU and adds no security benefit. The server publishes udp/51820 directly — the only non-nginx public ingress on cloud.
Stacks
The repo defines 15 stacks under stacks/:
- Cloud + edge:
nginx-proxy,node-red,influxdb,grafana,keycloak,portainer,rabbitmq,postfix - Cloud-only:
wireguard-server,gitea(HTTPS),jenkins,sql(postgres),mlflow,jupyterhub,mosquitto(FROST) - Edge-only:
wireguard-client
Sites
| Site | Status |
|---|---|
gemaal1 |
Scaffolded; awaiting hardware provisioning (PLANT_LAN_IP, WG peer key, OPCUA endpoint) |
Additional plants follow the same pattern under sites/<plant>/.
Conventions
- Folder names: kebab-case (
node-red,nginx-proxy,wireguard-server). - Compose filename:
compose.yml(official Compose Spec). - Composition: cloud/site composes pull stacks via
include:. Stacks remain runnable standalone for testing. - Secrets:
.env(gitignored) +.env.example(committed with placeholders). - Per-stack contents:
compose.yml,.env.example,README.md, optionalconfig/. - OT layer: out of scope; PLC + OPCUA managed in a separate process.
Open decisions
Tracked here so we don't forget. Each lands when we harden the relevant stack.
- MinIO / artifact store — MLflow uses local volume for now; switch to S3-compatible MinIO sidecar when artifacts grow.
- JupyterHub auth — target Keycloak OIDC via
oauthenticator.generic.GenericOAuthenticator. - WG client routing — split-tunnel vs full; per-peer
AllowedIPspolicy. - MQTT cross-broker shovel — only if FROST consumers must see RabbitMQ traffic or vice versa.
- Internal PKI — for cloud-internal hostnames not eligible for Let's Encrypt HTTP-01.
- Backup strategy — for
sql(postgres),influxdb,gitea-data,jenkins-home,mlflow-artifacts. - Provision Gemaal1 — fill in
PLANT_LAN_IP, WG peer key, OPCUA endpoint, deploy first stacks.