Stage 5 — make the cloud composition spin up in one command and add
the SensorThings (FROST) stack as a fully segregated tenant.
cloud/deploy.sh — idempotent, 7-step bring-up:
preflight → validate → up + wait → cert state → issue/renew →
service status → endpoint smoke test. Reissues LE cert only when
current issuer no longer matches ACME_CA_URI. Move-aside-then-
restore-on-failure so the bootstrap cert survives a failed certbot.
stacks/frost — new stack, segregated from shared sql/rabbitmq:
- dedicated postgis container (frost-db)
- dedicated internal mosquitto bus (frost-mosquitto)
- frost-http + frost-mqtt on a private frost-internal network,
joined to cloud-app only for nginx ingress at frost.wbd-rd.nl
- shared mosquitto stack deleted; rabbitmq remains the only public
MQTT broker (mqtt.wbd-rd.nl:8883 via stream proxy)
stacks/sql — pg_isready healthcheck so keycloak/gitea/mlflow can gate
on service_healthy via cloud-level depends_on overrides.
stacks/nginx-proxy:
- nginx-init service generates a self-signed bootstrap cert on
fresh deploy so nginx starts before certbot has issued a real one
- frost.wbd-rd.nl vhost (/FROST-Server → frost-http:8080,
/mqtt → frost-mqtt:9876 WebSocket)
stacks/mlflow — custom Dockerfile (upstream + psycopg2-binary) so the
official image can speak to the shared sql backend.
stacks/jupyterhub — DummyAuthenticator stub gated by
JUPYTERHUB_ADMIN_PASSWORD; TODO comments point at OIDC + DockerSpawner.
stacks/rabbitmq — config/{enabled_plugins,rabbitmq.conf} stubs
(management + mqtt plugins, MQTT auth required).
stacks/portainer — ports unpublished; nginx now the only ingress.
stacks/node-red — pin to 4.1 (the floating "4" tag does not exist).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
174 lines
8.5 KiB
Markdown
174 lines
8.5 KiB
Markdown
# Architecture
|
|
|
|
R&D infrastructure for Waterschap Brabantse Delta. Hub-and-spoke topology:
|
|
|
|
- **Cloud** layer — central services, one deployment, internet-facing.
|
|
- **Edge** layer — per-plant, plant-LAN-facing, tunneled to cloud via WireGuard.
|
|
- **OT** layer — per-plant, behind edge. Managed **outside** this repo.
|
|
|
|
```
|
|
Internet
|
|
│
|
|
┌───────────┴───────────┐
|
|
│ tcp/80, 443, 8883 │
|
|
│ udp/51820 │
|
|
▼ │
|
|
┌────────────────────────────────────┐
|
|
│ Cloud (central, one) │
|
|
│ nginx + certbot ◀── 80/443/8883 │
|
|
│ wireguard-server ◀── 51820/udp │
|
|
│ gitea, jenkins, keycloak, ... │
|
|
│ influxdb, grafana, node-red │
|
|
│ rabbitmq, postfix, portainer │
|
|
│ sql (postgres, single config) │
|
|
│ mlflow, jupyterhub │
|
|
│ frost (SensorThings API) │
|
|
└───────────────┬────────────────────┘
|
|
│ WireGuard tunnels
|
|
┌───────┼────────┬───────────┐
|
|
▼ ▼ ▼ ▼
|
|
┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐
|
|
│ Edge: │ │ Edge: │ │ Edge: │ │ ... │
|
|
│gemaal1│ │ ... │ │ ... │ │ │
|
|
└───┬───┘ └───────┘ └───────┘ └───────┘
|
|
│ TLS
|
|
▼
|
|
┌──────────┐
|
|
│ OT │ ← out of scope of this repo
|
|
│ OPCUA │
|
|
│ PLC │
|
|
└──────────┘
|
|
```
|
|
|
|
## Network topology (per layer)
|
|
|
|
Each layer uses **four internal Docker networks**:
|
|
|
|
| Network | Purpose | Notes |
|
|
|---|---|---|
|
|
| `edge` | Outermost. Cloud: internet-facing. Edge: plant-LAN-facing. | Only port-publishers join. |
|
|
| `app` | Application / automation tier. | Default landing for app services. |
|
|
| `data` | Databases (influxdb, sql). | `internal: true` — no internet egress. |
|
|
| `mgmt` | Identity, control plane (portainer, keycloak admin, wireguard mgmt). | Restricted. |
|
|
|
|
### Cloud attachments
|
|
|
|
```
|
|
edge : nginx, wireguard-server
|
|
app : nginx, rabbitmq, postfix, node-red, grafana,
|
|
jenkins, gitea, keycloak, mlflow, jupyterhub,
|
|
portainer, frost-http, frost-mqtt
|
|
data : influxdb, sql, grafana, mlflow
|
|
mgmt : portainer, keycloak, wireguard-server, jupyterhub
|
|
frost-internal (private to frost stack) :
|
|
frost-db (postgis), frost-mosquitto, frost-http, frost-mqtt
|
|
```
|
|
|
|
### Edge attachments
|
|
|
|
```
|
|
edge : nginx ← plant-LAN-facing
|
|
app : nginx, rabbitmq, postfix, node-red,
|
|
grafana, keycloak, wireguard-client
|
|
data : influxdb, grafana
|
|
mgmt : portainer, keycloak, wireguard-client
|
|
```
|
|
|
|
## Ingress (the only ports facing outside)
|
|
|
|
### Cloud (the central host)
|
|
|
|
| Port | Container | Notes |
|
|
|---|---|---|
|
|
| `tcp/80` | nginx | HTTP → 301 to 443; also serves `/.well-known/acme-challenge/` for certbot |
|
|
| `tcp/443` | nginx | All HTTPS UIs; TLS termination |
|
|
| `tcp/8883` | nginx | MQTT-TLS via `stream {}` block; SNI route to `rabbitmq:1883` |
|
|
| `udp/51820` | wireguard-server | VPN tunnel ingress |
|
|
|
|
Two containers publish a total of four ports. **Everything else is invisible** from outside the host.
|
|
|
|
### Edge (per-plant gateway)
|
|
|
|
| Port | Container | Bound to |
|
|
|---|---|---|
|
|
| `tcp/80` | nginx | Plant-LAN interface only |
|
|
| `tcp/443` | nginx | Plant-LAN interface only |
|
|
|
|
The edge `wireguard-client` initiates outbound to the cloud — it publishes **no port**.
|
|
|
|
## TLS strategy
|
|
|
|
**Stock nginx + certbot sidecar** (Let's Encrypt, HTTP-01 webroot).
|
|
|
|
- Stock `nginx:1.27-alpine` — required because we use the `stream {}` context for MQTT-TLS. `nginxproxy/nginx-proxy` (the jwilder image) is HTTP/HTTPS-only and can't expose stream cleanly.
|
|
- `certbot/certbot` sidecar runs `certbot renew` every 12h. Shared `nginx-certs` + `nginx-acme-challenge` volumes coordinate cert + challenge state between the two containers.
|
|
- Initial issuance is **manual** (one-time `docker compose run --rm certbot certonly --webroot …`). Renewal is automatic.
|
|
|
|
For cloud-internal hostnames not reachable via Let's Encrypt HTTP-01, the longer-term plan is a small internal PKI (step-ca or similar) backed by `sql`. Out of scope for first deploy.
|
|
|
|
## Why segment
|
|
|
|
- **Blast radius**: a compromised node-red on `app` cannot reach influxdb on `data` unless an explicit attachment is declared. Each service's reachability is auditable from `networks:` alone.
|
|
- **Defense in depth**: only nginx and wireguard-server bind host ports. No accidental `0.0.0.0` exposures.
|
|
- **NIS2 / utility audit**: WBD is in scope as water-sector. Compose networks evidence segmentation at runtime and on paper.
|
|
|
|
## Special cases
|
|
|
|
### Postfix (cloud + edge)
|
|
|
|
Postfix is **outbound-only**. It initiates SMTP to internet MX servers but accepts no inbound. Zero ingress, no published port, no listener facing internet. Just needs egress (every container has it via host NAT).
|
|
|
|
### MQTT — RabbitMQ for public traffic, dedicated mosquitto inside FROST
|
|
|
|
- **RabbitMQ** is the **only public MQTT broker**. SCADA / IoT / edge clients connect to `mqtt.wbd-rd.nl:8883` (TLS, via nginx `stream {}` block proxying to `rabbitmq:1883`). Authentication uses the standard RABBITMQ_USER/PASS.
|
|
- **frost-mosquitto** lives **inside the frost stack** on the private `frost-internal` docker network — it is purely the message bus between `frost-http` and `frost-mqtt`. It is not reachable from anywhere outside the frost stack.
|
|
- SensorThings-protocol MQTT (the FROST native MQTT API) is exposed to clients via `frost-mqtt`'s WebSocket port, proxied as `https://frost.wbd-rd.nl/mqtt`.
|
|
|
|
If FROST consumers also need to see SCADA traffic on RabbitMQ, add a RabbitMQ `shovel` plugin pointing into the frost stack. Not wired up by default.
|
|
|
|
### Gitea — HTTPS only
|
|
|
|
No SSH ingress. `GITEA__server__DISABLE_SSH=true`. All clones over HTTPS via nginx-proxy. Re-evaluate only if Gitea Actions runners require SSH push.
|
|
|
|
### WireGuard server (cloud)
|
|
|
|
WireGuard is connectionless UDP with crypto-routed packets. Proxying through nginx-stream breaks NAT/MTU and adds no security benefit. The server publishes `udp/51820` directly — the **only** non-nginx public ingress on cloud.
|
|
|
|
## Stacks
|
|
|
|
The repo defines **15 stacks** under `stacks/`:
|
|
|
|
- **Cloud + edge**: `nginx-proxy`, `node-red`, `influxdb`, `grafana`, `keycloak`, `portainer`, `rabbitmq`, `postfix`
|
|
- **Cloud-only**: `wireguard-server`, `gitea` (HTTPS), `jenkins`, `sql` (postgres), `mlflow`, `jupyterhub`, `frost` (SensorThings, dedicated postgis + internal bus)
|
|
- **Edge-only**: `wireguard-client`
|
|
|
|
## Sites
|
|
|
|
| Site | Status |
|
|
|---|---|
|
|
| `gemaal1` | Scaffolded; awaiting hardware provisioning (PLANT_LAN_IP, WG peer key, OPCUA endpoint) |
|
|
|
|
Additional plants follow the same pattern under `sites/<plant>/`.
|
|
|
|
## Conventions
|
|
|
|
- **Folder names**: kebab-case (`node-red`, `nginx-proxy`, `wireguard-server`).
|
|
- **Compose filename**: `compose.yml` (official Compose Spec).
|
|
- **Composition**: cloud/site composes pull stacks via `include:`. Stacks remain runnable standalone for testing.
|
|
- **Secrets**: `.env` (gitignored) + `.env.example` (committed with placeholders).
|
|
- **Per-stack contents**: `compose.yml`, `.env.example`, `README.md`, optional `config/`.
|
|
- **OT layer**: out of scope; PLC + OPCUA managed in a separate process.
|
|
|
|
## Open decisions
|
|
|
|
Tracked here so we don't forget. Each lands when we harden the relevant stack.
|
|
|
|
- **MinIO / artifact store** — MLflow uses local volume for now; switch to S3-compatible MinIO sidecar when artifacts grow.
|
|
- **JupyterHub auth** — target Keycloak OIDC via `oauthenticator.generic.GenericOAuthenticator`.
|
|
- **WG client routing** — split-tunnel vs full; per-peer `AllowedIPs` policy.
|
|
- **FROST auth** — currently `BasicAuthProvider` against the USERS table in `frost-db`; swap to Keycloak OIDC via the FROST OIDC plugin when SSO is rolled out.
|
|
- **MQTT cross-broker shovel** — only if FROST consumers must see RabbitMQ traffic or vice versa.
|
|
- **Internal PKI** — for cloud-internal hostnames not eligible for Let's Encrypt HTTP-01.
|
|
- **Backup strategy** — for `sql` (postgres), `influxdb`, `gitea-data`, `jenkins-home`, `mlflow-artifacts`.
|
|
- **Provision Gemaal1** — fill in `PLANT_LAN_IP`, WG peer key, OPCUA endpoint, deploy first stacks.
|