feat: SQL=postgres, nginx+certbot, MQTT split, ML stacks, gitea HTTPS-only, gemaal1 site

Round-2 changes locking in scaffold-phase decisions and adding ML/notebook stacks.

Locked decisions
- sql: postgres 16-alpine (was TBD); init.d/ mount for per-app DB provisioning
- nginx-proxy: stock nginx + certbot sidecar (was nginx:alpine TODO).
  Chose stock over nginxproxy/nginx-proxy because stream{} is required for
  MQTT-TLS reverse-proxy on tcp/8883 to rabbitmq:1883.
- gitea: HTTPS-only (DISABLE_SSH=true). No SSH port published.

MQTT split
- Remove stacks/mqtt placeholder.
- Add stacks/rabbitmq — general-purpose broker (AMQP + MQTT plugin),
  used at both cloud and edge. External MQTT clients reach cloud broker
  via nginx stream-proxy on 8883.
- Add stacks/mosquitto — reserved for the FROST (SensorThings) stack
  only. Cloud-only. Internal to its own stack; no external ingress.

ML / notebooks (cloud-only)
- stacks/mlflow — experiment tracking + model registry. Postgres backend
  on sql stack; local volume for artifacts (S3/MinIO is a TODO).
- stacks/jupyterhub — multi-user notebook server. DockerSpawner via
  mounted docker.sock; users spawn into cloud-app network so they can
  reach mlflow, influxdb (via grafana), rabbitmq.

Sites
- sites/gemaal1 — first edge deployment scaffold. Site-local override
  template for binding nginx to PLANT_LAN_IP.

Docs
- README + docs/architecture.md updated: stacks table now lists 15 stacks,
  ingress + attachment tables reflect mlflow/jupyterhub, TLS strategy
  section locked, MQTT-split section added, Gitea HTTPS-only noted.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
znetsixe
2026-05-21 13:22:46 +02:00
parent 8ab9061983
commit 2f5e3b4183
30 changed files with 492 additions and 116 deletions

View File

@@ -19,7 +19,7 @@ Stacks are pulled into the cloud and site composes via the Compose Spec `include
```bash
# Cloud hub (run on the central server)
cd cloud
cp .env.example .env # fill in real secrets
cp .env.example .env # fill in real secrets
docker compose up -d
# A plant edge (run on the edge gateway at the plant)
@@ -37,14 +37,23 @@ docker compose up -d
| grafana | Dashboards / SCADA | ✓ | ✓ |
| keycloak | Identity / SSO | ✓ | ✓ |
| portainer | Container management UI | ✓ | ✓ |
| nginx-proxy | HTTPS + MQTT-TLS reverse proxy | ✓ | ✓ |
| mqtt | MQTT broker + GUI | ✓ | ✓ |
| nginx-proxy | Stock nginx + certbot sidecar | ✓ | ✓ |
| rabbitmq | General-purpose broker (AMQP + MQTT plugin) | ✓ | ✓ |
| postfix | Outbound mail relay | ✓ | ✓ |
| wireguard-server | VPN server | ✓ | — |
| wireguard-client | VPN client | — | ✓ |
| gitea | Git server | ✓ | — |
| gitea | Git server (HTTPS-only) | ✓ | — |
| jenkins | CI/CD | ✓ | — |
| sql | Config DB (single point of config) | ✓ | — |
| sql | Config DB (postgres 16) | ✓ | — |
| mlflow | ML experiment tracking + registry | ✓ | — |
| jupyterhub | Multi-user notebook server | ✓ | — |
| mosquitto | MQTT broker for FROST stack only | ✓ | — |
## Sites
| Site | Status |
|---|---|
| gemaal1 | Scaffolded — awaiting hardware provisioning |
## Design
@@ -54,6 +63,6 @@ See [`docs/architecture.md`](docs/architecture.md) for the hub-and-spoke topolog
- kebab-case folder names
- `compose.yml` (Compose Spec), not `docker-compose.yml`
- Stack composes are pulled into cloud/site via `include:`
- Stack composes pulled into cloud/site via `include:`
- Secrets in `.env` files (gitignored); `.env.example` committed with placeholders
- OT layer (OPCUA, PLCs) is **out of scope** for this repo

View File

@@ -6,6 +6,7 @@ TZ=Europe/Amsterdam
# Domain / TLS
PRIMARY_DOMAIN=
LETSENCRYPT_EMAIL=
ACME_CA_URI=https://acme-v02.api.letsencrypt.org/directory
# WireGuard server
WG_SERVER_PORT=51820
@@ -14,6 +15,7 @@ WG_SERVER_PUBLIC_HOST=
# Keycloak (admin bootstrap)
KEYCLOAK_ADMIN=admin
KEYCLOAK_ADMIN_PASSWORD=
KEYCLOAK_HOSTNAME=
# InfluxDB
INFLUX_ADMIN_USER=admin
@@ -25,19 +27,38 @@ INFLUX_BUCKET=telemetry
# Grafana
GRAFANA_ADMIN_USER=admin
GRAFANA_ADMIN_PASSWORD=
GRAFANA_ROOT_URL=
# SQL (single point of config)
# SQL (postgres — single point of config)
SQL_DB=config
SQL_USER=config
SQL_PASSWORD=
# RabbitMQ
RABBITMQ_USER=admin
RABBITMQ_PASSWORD=
RABBITMQ_VHOST=/
# Postfix
POSTFIX_RELAYHOST=
POSTFIX_FROM_DOMAIN=
# Gitea
# Gitea (HTTPS-only; uses sql backend)
GITEA_ROOT_URL=
GITEA_DB_HOST=sql:5432
GITEA_DB_NAME=gitea
GITEA_DB_USER=gitea
GITEA_DB_PASSWORD=
# Jenkins
JENKINS_ADMIN_USER=admin
JENKINS_ADMIN_PASSWORD=
# MLflow (uses sql backend)
MLFLOW_DB_NAME=mlflow
MLFLOW_DB_USER=mlflow
MLFLOW_DB_PASSWORD=
# JupyterHub
JUPYTER_NOTEBOOK_IMAGE=jupyter/datascience-notebook:latest
JUPYTERHUB_ADMIN_USERS=

View File

@@ -4,20 +4,29 @@
name: cloud
# Uncomment includes as each stack is scaffolded with real services.
# Uncomment includes as each stack is hardened beyond stub.
include:
# Core ingress + identity
# - ../stacks/nginx-proxy/compose.yml
# - ../stacks/wireguard-server/compose.yml
# - ../stacks/keycloak/compose.yml
# - ../stacks/portainer/compose.yml
# Data
# - ../stacks/sql/compose.yml
# - ../stacks/influxdb/compose.yml
# - ../stacks/grafana/compose.yml
# Apps
# - ../stacks/node-red/compose.yml
# - ../stacks/mqtt/compose.yml
# - ../stacks/postfix/compose.yml
# - ../stacks/grafana/compose.yml
# - ../stacks/gitea/compose.yml
# - ../stacks/jenkins/compose.yml
# - ../stacks/sql/compose.yml
# Messaging + mail
# - ../stacks/rabbitmq/compose.yml
# - ../stacks/postfix/compose.yml
# ML / notebooks
# - ../stacks/mlflow/compose.yml
# - ../stacks/jupyterhub/compose.yml
# FROST (when deployed)
# - ../stacks/mosquitto/compose.yml
networks:
edge:
@@ -29,7 +38,7 @@ networks:
data:
name: cloud-data
driver: bridge
internal: true # databases — no internet egress
internal: true # databases — no internet egress
mgmt:
name: cloud-mgmt
driver: bridge

View File

@@ -13,21 +13,23 @@ R&D infrastructure for Waterschap Brabantse Delta. Hub-and-spoke topology:
│ tcp/80, 443, 8883 │
│ udp/51820 │
▼ │
┌───────────────────────────────────┐
│ Cloud (central, one) │
│ nginx-proxy ◀── 80/443/8883
│ wireguard-server ◀── 51820/udp │
│ gitea, jenkins, keycloak, ... │
│ influxdb, grafana, node-red │
│ mqtt, postfix, portainer
│ sql (single point of config) │
└───────────────┬───────────────────┘
┌───────────────────────────────────
│ Cloud (central, one)
│ nginx + certbot ◀── 80/443/8883 │
│ wireguard-server ◀── 51820/udp
│ gitea, jenkins, keycloak, ...
│ influxdb, grafana, node-red
rabbitmq, postfix, portainer │
│ sql (postgres, single config) │
│ mlflow, jupyterhub │
│ mosquitto (FROST stack only) │
└───────────────┬────────────────────┘
│ WireGuard tunnels
┌───────┼────────┬───────────┐
▼ ▼ ▼ ▼
┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐
│ Edge: │ │ Edge: │ │ Edge: │ │ ... │
plant1 │ │plant2 │ │plant3 │ │ │
gemaal1│ │ ... │ │ ... │ │ │
└───┬───┘ └───────┘ └───────┘ └───────┘
│ TLS
@@ -45,25 +47,27 @@ Each layer uses **four internal Docker networks**:
| Network | Purpose | Notes |
|---|---|---|
| `edge` | Outermost. Cloud: internet-facing. Edge: plant-LAN-facing. | Only port-publishers join. |
| `app` | Application / automation tier (node-red, grafana, jenkins, gitea, …). | Default landing for app services. |
| `app` | Application / automation tier. | Default landing for app services. |
| `data` | Databases (influxdb, sql). | `internal: true` — no internet egress. |
| `mgmt` | Identity, control plane (portainer, keycloak admin, wireguard mgmt). | Restricted. |
### Cloud attachments
```
edge : nginx-proxy, wireguard-server
app : nginx-proxy, mqtt, postfix, node-red, grafana,
jenkins, gitea, keycloak
data : influxdb, sql, grafana
mgmt : portainer, keycloak, wireguard-server
edge : nginx, wireguard-server
app : nginx, rabbitmq, postfix, node-red, grafana,
jenkins, gitea, keycloak, mlflow, jupyterhub
data : influxdb, sql, grafana, mlflow
mgmt : portainer, keycloak, wireguard-server, jupyterhub
```
(`mosquitto` joins `app` only when the FROST stack is deployed.)
### Edge attachments
```
edge : nginx-proxy ← plant-LAN-facing
app : nginx-proxy, mqtt, postfix, node-red,
edge : nginx ← plant-LAN-facing
app : nginx, rabbitmq, postfix, node-red,
grafana, keycloak, wireguard-client
data : influxdb, grafana
mgmt : portainer, keycloak, wireguard-client
@@ -75,9 +79,9 @@ mgmt : portainer, keycloak, wireguard-client
| Port | Container | Notes |
|---|---|---|
| `tcp/80` | nginx-proxy | HTTP → 301 to 443 |
| `tcp/443` | nginx-proxy | All HTTPS UIs; TLS termination |
| `tcp/8883` | nginx-proxy | MQTT-TLS via `stream {}` block, SNI route to broker |
| `tcp/80` | nginx | HTTP → 301 to 443; also serves `/.well-known/acme-challenge/` for certbot |
| `tcp/443` | nginx | All HTTPS UIs; TLS termination |
| `tcp/8883` | nginx | MQTT-TLS via `stream {}` block; SNI route to `rabbitmq:1883` |
| `udp/51820` | wireguard-server | VPN tunnel ingress |
Two containers publish a total of four ports. **Everything else is invisible** from outside the host.
@@ -86,34 +90,63 @@ Two containers publish a total of four ports. **Everything else is invisible** f
| Port | Container | Bound to |
|---|---|---|
| `tcp/80` | nginx-proxy | Plant-LAN interface only |
| `tcp/443` | nginx-proxy | Plant-LAN interface only |
| `tcp/80` | nginx | Plant-LAN interface only |
| `tcp/443` | nginx | Plant-LAN interface only |
The edge `wireguard-client` initiates outbound to the cloud — it publishes **no port**. On-site operators reach SCADA on the plant LAN; remote ops reach the same nginx via the WG tunnel.
The edge `wireguard-client` initiates outbound to the cloud — it publishes **no port**.
## TLS strategy
**Stock nginx + certbot sidecar** (Let's Encrypt, HTTP-01 webroot).
- Stock `nginx:1.27-alpine` — required because we use the `stream {}` context for MQTT-TLS. `nginxproxy/nginx-proxy` (the jwilder image) is HTTP/HTTPS-only and can't expose stream cleanly.
- `certbot/certbot` sidecar runs `certbot renew` every 12h. Shared `nginx-certs` + `nginx-acme-challenge` volumes coordinate cert + challenge state between the two containers.
- Initial issuance is **manual** (one-time `docker compose run --rm certbot certonly --webroot …`). Renewal is automatic.
For cloud-internal hostnames not reachable via Let's Encrypt HTTP-01, the longer-term plan is a small internal PKI (step-ca or similar) backed by `sql`. Out of scope for first deploy.
## Why segment
- **Blast radius**: a compromised node-red on `app` cannot reach influxdb on `data` unless an explicit attachment is declared. Each service's reachability is auditable from `networks:` alone.
- **Defense in depth**: only nginx-proxy and wireguard-server bind host ports. No accidental `0.0.0.0` exposures.
- **NIS2 / utility audit**: WBD is in scope as water-sector. Compose networks are a cheap way to evidence segmentation at runtime and on paper.
- **Defense in depth**: only nginx and wireguard-server bind host ports. No accidental `0.0.0.0` exposures.
- **NIS2 / utility audit**: WBD is in scope as water-sector. Compose networks evidence segmentation at runtime and on paper.
## Special cases
### Postfix (cloud + edge)
The diagram labels it "OUT ONLY". Postfix initiates outbound SMTP to internet MX servers but accepts **no inbound** mail. So Postfix has zero ingress, no published port, no listener facing the internet. It just needs egress (which every container has via host NAT).
Postfix is **outbound-only**. It initiates SMTP to internet MX servers but accepts no inbound. Zero ingress, no published port, no listener facing internet. Just needs egress (every container has it via host NAT).
### MQTT (cloud)
### MQTT — two brokers
nginx-proxy `stream {}` block reverse-proxies `tcp/8883` to the internal broker via SNI. The broker has **no published port**. Cleanest "everything through nginx" model.
- **RabbitMQ** is the **general-purpose** broker. Runs at both cloud and edge. MQTT plugin enabled. Cloud-side reachable externally via nginx stream proxy on `tcp/8883`. Edge-side fully internal.
- **Mosquitto** is reserved for the **FROST (SensorThings API) stack** only — cloud-only. Internal to its own stack — no external ingress unless FROST publishers need to push from outside (in which case use a separate stream block on a different port).
### MQTT (edge)
If FROST needs cross-broker forwarding, add a RabbitMQ `shovel` plugin pointing at `mosquitto`. Not wired up by default.
Broker is **fully internal** to the edge stack — no plant-LAN ingress. Node-RED on edge bridges OPCUA → broker, then broker → cloud broker over the WG tunnel. Field devices that need MQTT publish to the cloud broker via WG, not to the edge broker directly.
### Gitea — HTTPS only
No SSH ingress. `GITEA__server__DISABLE_SSH=true`. All clones over HTTPS via nginx-proxy. Re-evaluate only if Gitea Actions runners require SSH push.
### WireGuard server (cloud)
WireGuard is connectionless UDP with crypto-routed packets. It cannot be sensibly reverse-proxied (NAT/MTU break, no security benefit). The server publishes `udp/51820` directly — the **only** non-nginx public ingress on cloud.
WireGuard is connectionless UDP with crypto-routed packets. Proxying through nginx-stream breaks NAT/MTU and adds no security benefit. The server publishes `udp/51820` directly — the **only** non-nginx public ingress on cloud.
## Stacks
The repo defines **15 stacks** under `stacks/`:
- **Cloud + edge**: `nginx-proxy`, `node-red`, `influxdb`, `grafana`, `keycloak`, `portainer`, `rabbitmq`, `postfix`
- **Cloud-only**: `wireguard-server`, `gitea` (HTTPS), `jenkins`, `sql` (postgres), `mlflow`, `jupyterhub`, `mosquitto` (FROST)
- **Edge-only**: `wireguard-client`
## Sites
| Site | Status |
|---|---|
| `gemaal1` | Scaffolded; awaiting hardware provisioning (PLANT_LAN_IP, WG peer key, OPCUA endpoint) |
Additional plants follow the same pattern under `sites/<plant>/`.
## Conventions
@@ -126,10 +159,12 @@ WireGuard is connectionless UDP with crypto-routed packets. It cannot be sensibl
## Open decisions
These are deferred until we build the respective stack. Tracked here so we don't forget.
Tracked here so we don't forget. Each lands when we harden the relevant stack.
- **SQL flavor**: postgres / mariadb / mysql? Leaning postgres for the "single point of config" use case.
- **SSL strategy**: certbot inside nginx-proxy, acme-companion sidecar, or step-ca-driven internal PKI? Probably acme-companion against Let's Encrypt for external endpoints + internal PKI for service-to-service.
- **Keycloak storage**: bundled H2 (dev only) vs external SQL backend (probably the same `sql` stack).
- **Backup strategy** for `data` (influxdb, sql) and `mgmt` (gitea, jenkins workspaces).
- **First site**: which plant gets `sites/<plant>/` scaffolded first?
- **MinIO / artifact store** — MLflow uses local volume for now; switch to S3-compatible MinIO sidecar when artifacts grow.
- **JupyterHub auth** — target Keycloak OIDC via `oauthenticator.generic.GenericOAuthenticator`.
- **WG client routing** — split-tunnel vs full; per-peer `AllowedIPs` policy.
- **MQTT cross-broker shovel** — only if FROST consumers must see RabbitMQ traffic or vice versa.
- **Internal PKI** — for cloud-internal hostnames not eligible for Let's Encrypt HTTP-01.
- **Backup strategy** — for `sql` (postgres), `influxdb`, `gitea-data`, `jenkins-home`, `mlflow-artifacts`.
- **Provision Gemaal1** — fill in `PLANT_LAN_IP`, WG peer key, OPCUA endpoint, deploy first stacks.

View File

@@ -0,0 +1,32 @@
TZ=Europe/Amsterdam
# Plant LAN interface IP that nginx-proxy binds to (replaces 0.0.0.0)
PLANT_LAN_IP=
# WireGuard client peer config: see stacks/wireguard-client/config/wg0.conf
# InfluxDB (local DB at this plant)
INFLUX_ADMIN_USER=admin
INFLUX_ADMIN_PASSWORD=
INFLUX_ADMIN_TOKEN=
INFLUX_ORG=wbd
INFLUX_BUCKET=gemaal1
# Grafana (local SCADA at this plant)
GRAFANA_ADMIN_USER=admin
GRAFANA_ADMIN_PASSWORD=
GRAFANA_ROOT_URL=
# Keycloak (local realm)
KEYCLOAK_ADMIN=admin
KEYCLOAK_ADMIN_PASSWORD=
KEYCLOAK_HOSTNAME=
# RabbitMQ (general broker, edge-local — internal only)
RABBITMQ_USER=admin
RABBITMQ_PASSWORD=
RABBITMQ_VHOST=gemaal1
# Postfix
POSTFIX_RELAYHOST=
POSTFIX_FROM_DOMAIN=

37
sites/gemaal1/README.md Normal file
View File

@@ -0,0 +1,37 @@
# gemaal1
Edge deployment for pumping station **Gemaal1** — first production site.
## Hardware (fill in when provisioned)
- Edge gateway model: ?
- Plant LAN subnet: ?.?.?.0/24
- WAN: ?
- OT VLAN (PLC + OPCUA): ?.?.?.0/24
- OPCUA endpoint: opc.tcp://?
## What runs here
nginx-proxy (plant-LAN-facing, certbot for TLS), wireguard-client (outbound tunnel to cloud), keycloak (local realm), portainer, influxdb (local DB), grafana (local SCADA), node-red, rabbitmq (general broker, internal only), postfix.
## Run
```bash
cp .env.example .env # fill in real secrets + PLANT_LAN_IP
docker compose up -d
docker compose ps
```
## Ingress
| Port | Bound to |
|---|---|
| tcp/80, 443 | `${PLANT_LAN_IP}` only |
Remote ops reach the same nginx via the WireGuard tunnel from cloud (no extra port published).
## OT uplink
Node-RED + EVOLV nodes talk to the OPCUA server on the OT VLAN. The edge gateway must have a NIC on that VLAN. OPCUA + PLC are **managed outside this repo**.
See [`../../docs/architecture.md`](../../docs/architecture.md) for the full topology.

40
sites/gemaal1/compose.yml Normal file
View File

@@ -0,0 +1,40 @@
# Edge deployment for pumping station "Gemaal1".
# First production site; template for additional plants.
# Run: cp .env.example .env && docker compose up -d
name: edge-gemaal1
# Uncomment as each stack is hardened beyond stub.
include:
# - ../../stacks/nginx-proxy/compose.yml
# - ../../stacks/wireguard-client/compose.yml
# - ../../stacks/keycloak/compose.yml
# - ../../stacks/portainer/compose.yml
# - ../../stacks/influxdb/compose.yml
# - ../../stacks/grafana/compose.yml
# - ../../stacks/node-red/compose.yml
# - ../../stacks/rabbitmq/compose.yml
# - ../../stacks/postfix/compose.yml
# Site-specific overrides — bind nginx to the plant-LAN interface only.
# Uncomment once nginx-proxy is wired up and PLANT_LAN_IP is set in .env.
# services:
# nginx:
# ports:
# - "${PLANT_LAN_IP}:80:80"
# - "${PLANT_LAN_IP}:443:443"
networks:
edge:
name: edge-gemaal1-edge
driver: bridge
app:
name: edge-gemaal1-app
driver: bridge
data:
name: edge-gemaal1-data
driver: bridge
internal: true
mgmt:
name: edge-gemaal1-mgmt
driver: bridge

View File

@@ -1,5 +1,4 @@
GITEA_ROOT_URL=
GITEA_DB_TYPE=postgres
GITEA_DB_HOST=sql:5432
GITEA_DB_NAME=gitea
GITEA_DB_USER=gitea

View File

@@ -1,7 +1,8 @@
# gitea
Self-hosted git server. **Cloud-only stack.** (External repos live at `gitea.wbd-rd.nl`; this is for R&D internal use.)
Self-hosted git server. **Cloud-only.** (External repos live at `gitea.wbd-rd.nl`; this is for R&D internal use.)
- **Networks**: `app` (UI) + `data` (DB backend in `sql` stack)
- **Networks**: `app` (UI) + `data` (postgres backend in `sql` stack)
- **Volume**: `gitea-data` (repos + LFS + actions runners)
- **TODO**: SSH access strategy (nginx stream proxy port 22, or skip and use HTTPS-only), Keycloak OIDC, runners for Gitea Actions
- **SSH ingress**: **disabled** (`DISABLE_SSH=true`). All clones over HTTPS via nginx-proxy. Re-evaluate only if Gitea Actions runners need SSH push semantics.
- **TODO**: Keycloak OIDC, Gitea Actions runners, LFS storage policy

View File

@@ -1,5 +1,6 @@
# gitea — git server (cloud only)
# Networks: app + data (uses sql stack as DB backend)
# Ingress: HTTPS-only via nginx-proxy. No SSH port published.
services:
gitea:
@@ -10,7 +11,8 @@ services:
USER_UID: "1000"
USER_GID: "1000"
GITEA__server__ROOT_URL: ${GITEA_ROOT_URL}
GITEA__database__DB_TYPE: ${GITEA_DB_TYPE:-postgres}
GITEA__server__DISABLE_SSH: "true"
GITEA__database__DB_TYPE: postgres
GITEA__database__HOST: ${GITEA_DB_HOST:-sql:5432}
GITEA__database__NAME: ${GITEA_DB_NAME:-gitea}
GITEA__database__USER: ${GITEA_DB_USER}
@@ -18,7 +20,6 @@ services:
TZ: ${TZ:-Europe/Amsterdam}
volumes:
- gitea-data:/data
# SSH port: TODO — decide if Gitea SSH (22/2222) is exposed via nginx stream or skipped
networks:
app:

View File

@@ -0,0 +1,5 @@
# Image spawned per user (override to a custom EVOLV image with mlflow client preinstalled)
JUPYTER_NOTEBOOK_IMAGE=jupyter/datascience-notebook:latest
# Admin users (comma-separated, consumed by jupyterhub_config.py)
JUPYTERHUB_ADMIN_USERS=

View File

@@ -0,0 +1,16 @@
# jupyterhub
Multi-user JupyterHub. Each authenticated user gets their own notebook container via DockerSpawner. **Cloud-only.**
- **Networks**: `app` (UI proxied at `/jupyter` or subdomain) + `mgmt` (Docker socket so JupyterHub can spawn user containers)
- **Spawned user containers** land on the `cloud-app` network so they can reach mlflow, influxdb (via grafana proxy), rabbitmq
- **Config**: `config/jupyterhub_config.py` — DockerSpawner setup, authenticator, admin list, resource limits
## TODO
- DockerSpawner config (image, network, user volumes, idle culling)
- Keycloak OAuth via `oauthenticator.generic.GenericOAuthenticator`
- Build a project-specific notebook image with EVOLV libs + mlflow client + InfluxDB client preinstalled
- Per-user persistent volume mounted at `/home/jovyan/work`
- CPU / memory limits per user container
- Cull idle servers (`c.JupyterHub.services` cull-idle pattern)

View File

@@ -0,0 +1,25 @@
# jupyterhub — multi-user notebook server (cloud only)
# Networks: app (UI proxied by nginx) + mgmt (Docker socket for DockerSpawner)
services:
jupyterhub:
image: jupyterhub/jupyterhub:5
restart: unless-stopped
networks: [app, mgmt]
volumes:
- jupyterhub-data:/srv/jupyterhub
- /var/run/docker.sock:/var/run/docker.sock
- ./config/jupyterhub_config.py:/srv/jupyterhub/jupyterhub_config.py:ro
environment:
TZ: ${TZ:-Europe/Amsterdam}
DOCKER_NOTEBOOK_IMAGE: ${JUPYTER_NOTEBOOK_IMAGE:-jupyter/datascience-notebook:latest}
DOCKER_NETWORK_NAME: cloud-app
# TODO: DockerSpawner config in jupyterhub_config.py; Keycloak OAuthAuthenticator;
# preinstalled libraries; per-user persistent volumes; CPU/memory limits
networks:
app:
mgmt:
volumes:
jupyterhub-data:

View File

@@ -0,0 +1,3 @@
MLFLOW_DB_NAME=mlflow
MLFLOW_DB_USER=mlflow
MLFLOW_DB_PASSWORD=

12
stacks/mlflow/README.md Normal file
View File

@@ -0,0 +1,12 @@
# mlflow
MLflow tracking server + model registry. Used by data scientists running experiments from JupyterHub or local laptops. **Cloud-only.**
- **Networks**: `app` (UI on port 5000, reverse-proxied at `/mlflow` or subdomain) + `data` (postgres backend in `sql`)
- **Backend store**: postgres database `mlflow` — must be provisioned by `sql/config/init.d/`
- **Artifact store**: local volume `mlflow-artifacts`. Switch to S3/MinIO when artifact volume grows beyond a few GB.
- **TODO**:
- Provision `mlflow` DB + role in `sql` init scripts
- Keycloak OIDC via nginx `auth_request` (MLflow has no native auth — must front-end it)
- MinIO sidecar for S3-compatible artifact store
- Retention / cleanup policy for stale runs

27
stacks/mlflow/compose.yml Normal file
View File

@@ -0,0 +1,27 @@
# mlflow — experiment tracking + model registry (cloud only)
# Networks: app (UI on 5000, proxied by nginx) + data (postgres backend on sql stack)
services:
mlflow:
image: ghcr.io/mlflow/mlflow:v2.18.0
restart: unless-stopped
networks: [app, data]
command: >
mlflow server
--host 0.0.0.0
--port 5000
--backend-store-uri postgresql://${MLFLOW_DB_USER}:${MLFLOW_DB_PASSWORD}@sql:5432/${MLFLOW_DB_NAME}
--default-artifact-root /mlflow/artifacts
--serve-artifacts
volumes:
- mlflow-artifacts:/mlflow/artifacts
environment:
TZ: ${TZ:-Europe/Amsterdam}
# TODO: switch artifact store to S3/MinIO; Keycloak OIDC via nginx auth_request
networks:
app:
data:
volumes:
mlflow-artifacts:

View File

@@ -0,0 +1 @@
# mosquitto — broker uses config file, no env vars in stub

View File

@@ -0,0 +1,11 @@
# mosquitto
Eclipse Mosquitto MQTT broker. **Reserved for the FROST (SensorThings API) stack** — separate from the general-purpose `rabbitmq` broker. **Cloud-only.**
- **Network**: `app` (internal only — FROST services connect via service name `mosquitto`)
- **No external ingress** by default. If FROST needs external MQTT publishers, route them through a separate nginx stream block on a different port (not 8883 — that belongs to rabbitmq).
- **Config**: `config/mosquitto.conf` — listener config, ACLs, persistence
- **TODO**:
- ACL aligned with FROST topic structure
- Persistence retention policy
- Optional shovel from `rabbitmq` if cross-broker forwarding is needed

View File

@@ -0,0 +1,22 @@
# mosquitto — MQTT broker reserved for the FROST (SensorThings) stack
# Cloud-only. Internal to its own stack; no external ingress by default.
# Networks: app
services:
mosquitto:
image: eclipse-mosquitto:2.0
restart: unless-stopped
networks: [app]
volumes:
- ./config/mosquitto.conf:/mosquitto/config/mosquitto.conf:ro
- mosquitto-data:/mosquitto/data
- mosquitto-log:/mosquitto/log
# No 'ports:' — FROST is the only intended consumer. If external MQTT
# access for FROST is needed later, add a separate nginx stream block.
networks:
app:
volumes:
mosquitto-data:
mosquitto-log:

View File

@@ -1,2 +0,0 @@
# mqtt — broker uses config file, not env
# GUI vars land here once a GUI image is chosen

View File

@@ -1,8 +0,0 @@
# mqtt
Eclipse Mosquitto broker. Cloud-side accepts external connections via nginx stream proxy. Edge-side is fully internal.
- **Network**: `app` (no published port — nginx-proxy fronts external traffic on cloud)
- **Edge note**: on edge stacks, broker stays purely internal; node-red bridges OPCUA → broker → cloud broker over WG
- **Config**: `config/mosquitto.conf` — listener config, ACLs, persistence
- **TODO**: ACL policy, bridge config to cloud broker, GUI choice (mqtt-explorer / hivemq web / custom)

View File

@@ -1,24 +0,0 @@
# mqtt — MQTT broker (Eclipse Mosquitto) + optional GUI
# Networks: app (no port published — reached via nginx stream proxy on 8883)
services:
mqtt-broker:
image: eclipse-mosquitto:2.0
restart: unless-stopped
networks: [app]
volumes:
- ./config/mosquitto.conf:/mosquitto/config/mosquitto.conf:ro
- mqtt-data:/mosquitto/data
- mqtt-log:/mosquitto/log
# No 'ports:' — nginx-proxy stream-proxies external 8883 to internal 1883/8883
# mqtt-gui: # TODO: choose a GUI image (mqtt-explorer? hivemq web client? custom)
# image: ...
# networks: [app]
networks:
app:
volumes:
mqtt-data:
mqtt-log:

View File

@@ -1,2 +1,4 @@
# nginx-proxy — config-file-driven, no env vars in stub
# Domain + cert settings will land here once SSL strategy is chosen
LETSENCRYPT_EMAIL=
# Production CA: https://acme-v02.api.letsencrypt.org/directory
# Staging CA (testing): https://acme-staging-v02.api.letsencrypt.org/directory
ACME_CA_URI=https://acme-v02.api.letsencrypt.org/directory

View File

@@ -1,12 +1,47 @@
# nginx-proxy
The single web ingress. Reverse-proxies HTTPS UIs and stream-proxies MQTT-TLS.
The single web ingress for cloud + edge. Reverse-proxies HTTPS UIs and stream-proxies MQTT-TLS to RabbitMQ. TLS certificates managed by a certbot sidecar (Let's Encrypt, HTTP-01 webroot challenge).
- **Image**: stock `nginx:1.27-alpine` (we don't use `nginxproxy/nginx-proxy` because we need the `stream {}` context for MQTT-TLS, which that image doesn't expose cleanly)
- **Sidecar**: `certbot/certbot:latest` — renews every 12h, shared `nginx-certs` + `nginx-acme-challenge` volumes
- **Networks**: `edge` (the only port-publisher) + `app` (talks to upstream services)
- **Host ports**: `tcp/80`, `tcp/443`, `tcp/8883`
- **Config**:
- `config/nginx.conf` — base
- `config/conf.d/*.conf` — HTTP vhosts (one per upstream UI)
- `config/stream.d/mqtt.conf` — MQTT-TLS stream block, SNI route to mqtt broker
- `config/certs/` — TLS certs (volume-mounted from cert manager)
- **TODO**: pick SSL strategy (acme-companion sidecar vs certbot vs internal PKI), write vhost templates per upstream
## Config layout
```
config/
├── nginx.conf # base config — must include `stream {}` directive
├── conf.d/ # HTTP vhosts (one per upstream UI)
│ ├── grafana.conf
│ ├── node-red.conf
│ ├── gitea.conf
│ └── ...
└── stream.d/
└── mqtt.conf # MQTT-TLS stream block, SNI route to rabbitmq:1883
```
Volumes:
- `nginx-certs` — Let's Encrypt cert chains (`/etc/letsencrypt`), read-only mounted into nginx, writable from certbot
- `nginx-acme-challenge` — webroot for HTTP-01 challenges (`/var/www/certbot`)
## Initial cert issuance
1. Start with HTTP-only nginx config (serving `/.well-known/acme-challenge/`).
2. Issue:
```bash
docker compose run --rm certbot certonly \
--webroot -w /var/www/certbot \
--email "$LETSENCRYPT_EMAIL" --agree-tos --no-eff-email \
-d gitea.example.com -d grafana.example.com -d nodered.example.com
```
3. Drop HTTPS vhost configs into `config/conf.d/` and reload nginx.
The sidecar then renews automatically.
## TODO
- Write base `config/nginx.conf` (`http` + `stream` contexts)
- Per-upstream vhost templates with OIDC `auth_request` to Keycloak
- Decide internal PKI vs Let's Encrypt for cloud-internal hostnames not reachable from the public internet
- Edge-side variant: bind to plant-LAN IP only, internal CA for plant.local hostnames

View File

@@ -1,22 +1,44 @@
# nginx-proxy — TLS reverse proxy (HTTPS + MQTT-TLS)
# nginx-proxy — TLS reverse proxy (HTTPS + MQTT-TLS stream proxy)
# Stock nginx + certbot sidecar for Let's Encrypt automation.
# Networks: edge (port publisher) + app (proxy targets)
# Publishes: 80, 443, 8883 on the host
services:
nginx-proxy:
nginx:
image: nginx:1.27-alpine
restart: unless-stopped
networks: [edge, app]
ports:
- "80:80"
- "443:443"
- "8883:8883" # MQTT-TLS via stream{} block
- "8883:8883" # MQTT-TLS via stream{} block, SNI route to rabbitmq
volumes:
- ./config/nginx.conf:/etc/nginx/nginx.conf:ro
- ./config/conf.d:/etc/nginx/conf.d:ro
- ./config/stream.d:/etc/nginx/stream.d:ro
- ./config/nginx.conf:/etc/nginx/nginx.conf:ro
- nginx-certs:/etc/nginx/certs:ro
# TODO: SSL strategy (acme-companion sidecar vs certbot vs internal PKI)
- nginx-certs:/etc/letsencrypt:ro
- nginx-acme-challenge:/var/www/certbot:ro
depends_on:
- certbot
certbot:
image: certbot/certbot:latest
restart: unless-stopped
volumes:
- nginx-certs:/etc/letsencrypt
- nginx-acme-challenge:/var/www/certbot
entrypoint: /bin/sh -c
command: >
"trap exit TERM;
while :; do
certbot renew --webroot -w /var/www/certbot --quiet;
sleep 12h & wait $${!};
done"
# Initial issuance is manual:
# docker compose run --rm certbot certonly \
# --webroot -w /var/www/certbot \
# --email "$LETSENCRYPT_EMAIL" --agree-tos --no-eff-email \
# -d <host1> -d <host2> ...
networks:
edge:
@@ -24,3 +46,4 @@ networks:
volumes:
nginx-certs:
nginx-acme-challenge:

View File

@@ -0,0 +1,3 @@
RABBITMQ_USER=admin
RABBITMQ_PASSWORD=
RABBITMQ_VHOST=/

17
stacks/rabbitmq/README.md Normal file
View File

@@ -0,0 +1,17 @@
# rabbitmq
General-purpose message broker. AMQP for app-to-app messaging; MQTT plugin for external MQTT-TLS clients (fronted by nginx-proxy stream block).
Used at both cloud and edge. The `mosquitto` stack is reserved for the FROST SensorThings stack only — do not confuse the two.
- **Network**: `app` — no published port
- **External MQTT** clients reach `nginx-proxy:8883` (cloud) which stream-proxies to `rabbitmq:1883` internally. Edge brokers are internal-only.
- **Management UI**: port 15672 → reverse-proxied through nginx-proxy
- **Plugins to enable** in `config/enabled_plugins`:
```
[rabbitmq_management, rabbitmq_mqtt, rabbitmq_web_mqtt].
```
- **TODO**:
- ACL policies
- VHost layout (cloud-bus, plant-buses, ml-bus)
- Shovel/federation to `mosquitto` if FROST needs MQTT bridge to general broker

View File

@@ -0,0 +1,26 @@
# rabbitmq — general-purpose message broker (AMQP + MQTT plugin)
# Used at both cloud and edge. Cloud-side: external clients reach it via
# nginx stream proxy on tcp/8883. Edge-side: internal only.
# Networks: app
services:
rabbitmq:
image: rabbitmq:3.13-management
restart: unless-stopped
networks: [app]
environment:
RABBITMQ_DEFAULT_USER: ${RABBITMQ_USER}
RABBITMQ_DEFAULT_PASS: ${RABBITMQ_PASSWORD}
RABBITMQ_DEFAULT_VHOST: ${RABBITMQ_VHOST:-/}
TZ: ${TZ:-Europe/Amsterdam}
volumes:
- rabbitmq-data:/var/lib/rabbitmq
- ./config/enabled_plugins:/etc/rabbitmq/enabled_plugins:ro
- ./config/rabbitmq.conf:/etc/rabbitmq/rabbitmq.conf:ro
# No 'ports:' — nginx-proxy stream-proxies external 8883 to internal 1883
networks:
app:
volumes:
rabbitmq-data:

View File

@@ -1,11 +1,9 @@
# sql
Central configuration database — the "single point of config" backing Keycloak, Gitea, and other stacks that need a relational store. **Cloud-only stack.**
Central configuration database — the "single point of config" backing Keycloak, Gitea, MLflow, and any stack that needs a relational store. **Cloud-only.**
- **Engine**: postgres 16-alpine
- **Network**: `data` only (no internet egress)
- **Engine**: stub uses **postgres:16-alpine** pending decision (postgres vs mariadb vs mysql)
- **Volume**: `sql-data`
- **TODO**:
- confirm engine choice (likely postgres for keycloak + gitea compatibility)
- per-app database/role provisioning (init scripts in `config/init.d/`)
- backup strategy (pg_dump cron sidecar vs streaming replica)
- **Volume**: `sql-data` (PGDATA)
- **Init scripts**: `config/init.d/*.sql` runs on first start — provisions per-app databases/roles (gitea, keycloak, mlflow, …)
- **TODO**: backup strategy (pg_dump cron sidecar vs streaming replica)

View File

@@ -1,6 +1,5 @@
# sql — single point of config DB (cloud only)
# sql — central config DB (postgres, cloud only)
# Networks: data (no internet egress)
# TBD: postgres / mariadb / mysql — stub uses postgres pending decision
services:
sql:
@@ -14,6 +13,7 @@ services:
TZ: ${TZ:-Europe/Amsterdam}
volumes:
- sql-data:/var/lib/postgresql/data
- ./config/init.d:/docker-entrypoint-initdb.d:ro
networks:
data: