Round-2 changes locking in scaffold-phase decisions and adding ML/notebook stacks.
Locked decisions
- sql: postgres 16-alpine (was TBD); init.d/ mount for per-app DB provisioning
- nginx-proxy: stock nginx + certbot sidecar (was nginx:alpine TODO).
Chose stock over nginxproxy/nginx-proxy because stream{} is required for
MQTT-TLS reverse-proxy on tcp/8883 to rabbitmq:1883.
- gitea: HTTPS-only (DISABLE_SSH=true). No SSH port published.
MQTT split
- Remove stacks/mqtt placeholder.
- Add stacks/rabbitmq — general-purpose broker (AMQP + MQTT plugin),
used at both cloud and edge. External MQTT clients reach cloud broker
via nginx stream-proxy on 8883.
- Add stacks/mosquitto — reserved for the FROST (SensorThings) stack
only. Cloud-only. Internal to its own stack; no external ingress.
ML / notebooks (cloud-only)
- stacks/mlflow — experiment tracking + model registry. Postgres backend
on sql stack; local volume for artifacts (S3/MinIO is a TODO).
- stacks/jupyterhub — multi-user notebook server. DockerSpawner via
mounted docker.sock; users spawn into cloud-app network so they can
reach mlflow, influxdb (via grafana), rabbitmq.
Sites
- sites/gemaal1 — first edge deployment scaffold. Site-local override
template for binding nginx to PLANT_LAN_IP.
Docs
- README + docs/architecture.md updated: stacks table now lists 15 stacks,
ingress + attachment tables reflect mlflow/jupyterhub, TLS strategy
section locked, MQTT-split section added, Gitea HTTPS-only noted.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
171 lines
8.0 KiB
Markdown
171 lines
8.0 KiB
Markdown
# Architecture
|
|
|
|
R&D infrastructure for Waterschap Brabantse Delta. Hub-and-spoke topology:
|
|
|
|
- **Cloud** layer — central services, one deployment, internet-facing.
|
|
- **Edge** layer — per-plant, plant-LAN-facing, tunneled to cloud via WireGuard.
|
|
- **OT** layer — per-plant, behind edge. Managed **outside** this repo.
|
|
|
|
```
|
|
Internet
|
|
│
|
|
┌───────────┴───────────┐
|
|
│ tcp/80, 443, 8883 │
|
|
│ udp/51820 │
|
|
▼ │
|
|
┌────────────────────────────────────┐
|
|
│ Cloud (central, one) │
|
|
│ nginx + certbot ◀── 80/443/8883 │
|
|
│ wireguard-server ◀── 51820/udp │
|
|
│ gitea, jenkins, keycloak, ... │
|
|
│ influxdb, grafana, node-red │
|
|
│ rabbitmq, postfix, portainer │
|
|
│ sql (postgres, single config) │
|
|
│ mlflow, jupyterhub │
|
|
│ mosquitto (FROST stack only) │
|
|
└───────────────┬────────────────────┘
|
|
│ WireGuard tunnels
|
|
┌───────┼────────┬───────────┐
|
|
▼ ▼ ▼ ▼
|
|
┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐
|
|
│ Edge: │ │ Edge: │ │ Edge: │ │ ... │
|
|
│gemaal1│ │ ... │ │ ... │ │ │
|
|
└───┬───┘ └───────┘ └───────┘ └───────┘
|
|
│ TLS
|
|
▼
|
|
┌──────────┐
|
|
│ OT │ ← out of scope of this repo
|
|
│ OPCUA │
|
|
│ PLC │
|
|
└──────────┘
|
|
```
|
|
|
|
## Network topology (per layer)
|
|
|
|
Each layer uses **four internal Docker networks**:
|
|
|
|
| Network | Purpose | Notes |
|
|
|---|---|---|
|
|
| `edge` | Outermost. Cloud: internet-facing. Edge: plant-LAN-facing. | Only port-publishers join. |
|
|
| `app` | Application / automation tier. | Default landing for app services. |
|
|
| `data` | Databases (influxdb, sql). | `internal: true` — no internet egress. |
|
|
| `mgmt` | Identity, control plane (portainer, keycloak admin, wireguard mgmt). | Restricted. |
|
|
|
|
### Cloud attachments
|
|
|
|
```
|
|
edge : nginx, wireguard-server
|
|
app : nginx, rabbitmq, postfix, node-red, grafana,
|
|
jenkins, gitea, keycloak, mlflow, jupyterhub
|
|
data : influxdb, sql, grafana, mlflow
|
|
mgmt : portainer, keycloak, wireguard-server, jupyterhub
|
|
```
|
|
|
|
(`mosquitto` joins `app` only when the FROST stack is deployed.)
|
|
|
|
### Edge attachments
|
|
|
|
```
|
|
edge : nginx ← plant-LAN-facing
|
|
app : nginx, rabbitmq, postfix, node-red,
|
|
grafana, keycloak, wireguard-client
|
|
data : influxdb, grafana
|
|
mgmt : portainer, keycloak, wireguard-client
|
|
```
|
|
|
|
## Ingress (the only ports facing outside)
|
|
|
|
### Cloud (the central host)
|
|
|
|
| Port | Container | Notes |
|
|
|---|---|---|
|
|
| `tcp/80` | nginx | HTTP → 301 to 443; also serves `/.well-known/acme-challenge/` for certbot |
|
|
| `tcp/443` | nginx | All HTTPS UIs; TLS termination |
|
|
| `tcp/8883` | nginx | MQTT-TLS via `stream {}` block; SNI route to `rabbitmq:1883` |
|
|
| `udp/51820` | wireguard-server | VPN tunnel ingress |
|
|
|
|
Two containers publish a total of four ports. **Everything else is invisible** from outside the host.
|
|
|
|
### Edge (per-plant gateway)
|
|
|
|
| Port | Container | Bound to |
|
|
|---|---|---|
|
|
| `tcp/80` | nginx | Plant-LAN interface only |
|
|
| `tcp/443` | nginx | Plant-LAN interface only |
|
|
|
|
The edge `wireguard-client` initiates outbound to the cloud — it publishes **no port**.
|
|
|
|
## TLS strategy
|
|
|
|
**Stock nginx + certbot sidecar** (Let's Encrypt, HTTP-01 webroot).
|
|
|
|
- Stock `nginx:1.27-alpine` — required because we use the `stream {}` context for MQTT-TLS. `nginxproxy/nginx-proxy` (the jwilder image) is HTTP/HTTPS-only and can't expose stream cleanly.
|
|
- `certbot/certbot` sidecar runs `certbot renew` every 12h. Shared `nginx-certs` + `nginx-acme-challenge` volumes coordinate cert + challenge state between the two containers.
|
|
- Initial issuance is **manual** (one-time `docker compose run --rm certbot certonly --webroot …`). Renewal is automatic.
|
|
|
|
For cloud-internal hostnames not reachable via Let's Encrypt HTTP-01, the longer-term plan is a small internal PKI (step-ca or similar) backed by `sql`. Out of scope for first deploy.
|
|
|
|
## Why segment
|
|
|
|
- **Blast radius**: a compromised node-red on `app` cannot reach influxdb on `data` unless an explicit attachment is declared. Each service's reachability is auditable from `networks:` alone.
|
|
- **Defense in depth**: only nginx and wireguard-server bind host ports. No accidental `0.0.0.0` exposures.
|
|
- **NIS2 / utility audit**: WBD is in scope as water-sector. Compose networks evidence segmentation at runtime and on paper.
|
|
|
|
## Special cases
|
|
|
|
### Postfix (cloud + edge)
|
|
|
|
Postfix is **outbound-only**. It initiates SMTP to internet MX servers but accepts no inbound. Zero ingress, no published port, no listener facing internet. Just needs egress (every container has it via host NAT).
|
|
|
|
### MQTT — two brokers
|
|
|
|
- **RabbitMQ** is the **general-purpose** broker. Runs at both cloud and edge. MQTT plugin enabled. Cloud-side reachable externally via nginx stream proxy on `tcp/8883`. Edge-side fully internal.
|
|
- **Mosquitto** is reserved for the **FROST (SensorThings API) stack** only — cloud-only. Internal to its own stack — no external ingress unless FROST publishers need to push from outside (in which case use a separate stream block on a different port).
|
|
|
|
If FROST needs cross-broker forwarding, add a RabbitMQ `shovel` plugin pointing at `mosquitto`. Not wired up by default.
|
|
|
|
### Gitea — HTTPS only
|
|
|
|
No SSH ingress. `GITEA__server__DISABLE_SSH=true`. All clones over HTTPS via nginx-proxy. Re-evaluate only if Gitea Actions runners require SSH push.
|
|
|
|
### WireGuard server (cloud)
|
|
|
|
WireGuard is connectionless UDP with crypto-routed packets. Proxying through nginx-stream breaks NAT/MTU and adds no security benefit. The server publishes `udp/51820` directly — the **only** non-nginx public ingress on cloud.
|
|
|
|
## Stacks
|
|
|
|
The repo defines **15 stacks** under `stacks/`:
|
|
|
|
- **Cloud + edge**: `nginx-proxy`, `node-red`, `influxdb`, `grafana`, `keycloak`, `portainer`, `rabbitmq`, `postfix`
|
|
- **Cloud-only**: `wireguard-server`, `gitea` (HTTPS), `jenkins`, `sql` (postgres), `mlflow`, `jupyterhub`, `mosquitto` (FROST)
|
|
- **Edge-only**: `wireguard-client`
|
|
|
|
## Sites
|
|
|
|
| Site | Status |
|
|
|---|---|
|
|
| `gemaal1` | Scaffolded; awaiting hardware provisioning (PLANT_LAN_IP, WG peer key, OPCUA endpoint) |
|
|
|
|
Additional plants follow the same pattern under `sites/<plant>/`.
|
|
|
|
## Conventions
|
|
|
|
- **Folder names**: kebab-case (`node-red`, `nginx-proxy`, `wireguard-server`).
|
|
- **Compose filename**: `compose.yml` (official Compose Spec).
|
|
- **Composition**: cloud/site composes pull stacks via `include:`. Stacks remain runnable standalone for testing.
|
|
- **Secrets**: `.env` (gitignored) + `.env.example` (committed with placeholders).
|
|
- **Per-stack contents**: `compose.yml`, `.env.example`, `README.md`, optional `config/`.
|
|
- **OT layer**: out of scope; PLC + OPCUA managed in a separate process.
|
|
|
|
## Open decisions
|
|
|
|
Tracked here so we don't forget. Each lands when we harden the relevant stack.
|
|
|
|
- **MinIO / artifact store** — MLflow uses local volume for now; switch to S3-compatible MinIO sidecar when artifacts grow.
|
|
- **JupyterHub auth** — target Keycloak OIDC via `oauthenticator.generic.GenericOAuthenticator`.
|
|
- **WG client routing** — split-tunnel vs full; per-peer `AllowedIPs` policy.
|
|
- **MQTT cross-broker shovel** — only if FROST consumers must see RabbitMQ traffic or vice versa.
|
|
- **Internal PKI** — for cloud-internal hostnames not eligible for Let's Encrypt HTTP-01.
|
|
- **Backup strategy** — for `sql` (postgres), `influxdb`, `gitea-data`, `jenkins-home`, `mlflow-artifacts`.
|
|
- **Provision Gemaal1** — fill in `PLANT_LAN_IP`, WG peer key, OPCUA endpoint, deploy first stacks.
|