diff --git a/README.md b/README.md index 9daedde..8f19b1f 100644 --- a/README.md +++ b/README.md @@ -19,7 +19,7 @@ Stacks are pulled into the cloud and site composes via the Compose Spec `include ```bash # Cloud hub (run on the central server) cd cloud -cp .env.example .env # fill in real secrets +cp .env.example .env # fill in real secrets docker compose up -d # A plant edge (run on the edge gateway at the plant) @@ -37,14 +37,23 @@ docker compose up -d | grafana | Dashboards / SCADA | ✓ | ✓ | | keycloak | Identity / SSO | ✓ | ✓ | | portainer | Container management UI | ✓ | ✓ | -| nginx-proxy | HTTPS + MQTT-TLS reverse proxy | ✓ | ✓ | -| mqtt | MQTT broker + GUI | ✓ | ✓ | +| nginx-proxy | Stock nginx + certbot sidecar | ✓ | ✓ | +| rabbitmq | General-purpose broker (AMQP + MQTT plugin) | ✓ | ✓ | | postfix | Outbound mail relay | ✓ | ✓ | | wireguard-server | VPN server | ✓ | — | | wireguard-client | VPN client | — | ✓ | -| gitea | Git server | ✓ | — | +| gitea | Git server (HTTPS-only) | ✓ | — | | jenkins | CI/CD | ✓ | — | -| sql | Config DB (single point of config) | ✓ | — | +| sql | Config DB (postgres 16) | ✓ | — | +| mlflow | ML experiment tracking + registry | ✓ | — | +| jupyterhub | Multi-user notebook server | ✓ | — | +| mosquitto | MQTT broker for FROST stack only | ✓ | — | + +## Sites + +| Site | Status | +|---|---| +| gemaal1 | Scaffolded — awaiting hardware provisioning | ## Design @@ -54,6 +63,6 @@ See [`docs/architecture.md`](docs/architecture.md) for the hub-and-spoke topolog - kebab-case folder names - `compose.yml` (Compose Spec), not `docker-compose.yml` -- Stack composes are pulled into cloud/site via `include:` +- Stack composes pulled into cloud/site via `include:` - Secrets in `.env` files (gitignored); `.env.example` committed with placeholders - OT layer (OPCUA, PLCs) is **out of scope** for this repo diff --git a/cloud/.env.example b/cloud/.env.example index c34a044..311aeb0 100644 --- a/cloud/.env.example +++ b/cloud/.env.example @@ -6,6 +6,7 @@ TZ=Europe/Amsterdam # Domain / TLS PRIMARY_DOMAIN= LETSENCRYPT_EMAIL= +ACME_CA_URI=https://acme-v02.api.letsencrypt.org/directory # WireGuard server WG_SERVER_PORT=51820 @@ -14,6 +15,7 @@ WG_SERVER_PUBLIC_HOST= # Keycloak (admin bootstrap) KEYCLOAK_ADMIN=admin KEYCLOAK_ADMIN_PASSWORD= +KEYCLOAK_HOSTNAME= # InfluxDB INFLUX_ADMIN_USER=admin @@ -25,19 +27,38 @@ INFLUX_BUCKET=telemetry # Grafana GRAFANA_ADMIN_USER=admin GRAFANA_ADMIN_PASSWORD= +GRAFANA_ROOT_URL= -# SQL (single point of config) +# SQL (postgres — single point of config) SQL_DB=config SQL_USER=config SQL_PASSWORD= +# RabbitMQ +RABBITMQ_USER=admin +RABBITMQ_PASSWORD= +RABBITMQ_VHOST=/ + # Postfix POSTFIX_RELAYHOST= POSTFIX_FROM_DOMAIN= -# Gitea +# Gitea (HTTPS-only; uses sql backend) GITEA_ROOT_URL= +GITEA_DB_HOST=sql:5432 +GITEA_DB_NAME=gitea +GITEA_DB_USER=gitea +GITEA_DB_PASSWORD= # Jenkins JENKINS_ADMIN_USER=admin JENKINS_ADMIN_PASSWORD= + +# MLflow (uses sql backend) +MLFLOW_DB_NAME=mlflow +MLFLOW_DB_USER=mlflow +MLFLOW_DB_PASSWORD= + +# JupyterHub +JUPYTER_NOTEBOOK_IMAGE=jupyter/datascience-notebook:latest +JUPYTERHUB_ADMIN_USERS= diff --git a/cloud/compose.yml b/cloud/compose.yml index 7fb1f8a..61d0d95 100644 --- a/cloud/compose.yml +++ b/cloud/compose.yml @@ -4,20 +4,29 @@ name: cloud -# Uncomment includes as each stack is scaffolded with real services. +# Uncomment includes as each stack is hardened beyond stub. include: + # Core ingress + identity # - ../stacks/nginx-proxy/compose.yml # - ../stacks/wireguard-server/compose.yml # - ../stacks/keycloak/compose.yml # - ../stacks/portainer/compose.yml + # Data + # - ../stacks/sql/compose.yml # - ../stacks/influxdb/compose.yml - # - ../stacks/grafana/compose.yml + # Apps # - ../stacks/node-red/compose.yml - # - ../stacks/mqtt/compose.yml - # - ../stacks/postfix/compose.yml + # - ../stacks/grafana/compose.yml # - ../stacks/gitea/compose.yml # - ../stacks/jenkins/compose.yml - # - ../stacks/sql/compose.yml + # Messaging + mail + # - ../stacks/rabbitmq/compose.yml + # - ../stacks/postfix/compose.yml + # ML / notebooks + # - ../stacks/mlflow/compose.yml + # - ../stacks/jupyterhub/compose.yml + # FROST (when deployed) + # - ../stacks/mosquitto/compose.yml networks: edge: @@ -29,7 +38,7 @@ networks: data: name: cloud-data driver: bridge - internal: true # databases — no internet egress + internal: true # databases — no internet egress mgmt: name: cloud-mgmt driver: bridge diff --git a/docs/architecture.md b/docs/architecture.md index 2e2654e..4a57050 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -13,21 +13,23 @@ R&D infrastructure for Waterschap Brabantse Delta. Hub-and-spoke topology: │ tcp/80, 443, 8883 │ │ udp/51820 │ ▼ │ -┌───────────────────────────────────┐ -│ Cloud (central, one) │ -│ nginx-proxy ◀── 80/443/8883 │ -│ wireguard-server ◀── 51820/udp │ -│ gitea, jenkins, keycloak, ... │ -│ influxdb, grafana, node-red │ -│ mqtt, postfix, portainer │ -│ sql (single point of config) │ -└───────────────┬───────────────────┘ +┌────────────────────────────────────┐ +│ Cloud (central, one) │ +│ nginx + certbot ◀── 80/443/8883 │ +│ wireguard-server ◀── 51820/udp │ +│ gitea, jenkins, keycloak, ... │ +│ influxdb, grafana, node-red │ +│ rabbitmq, postfix, portainer │ +│ sql (postgres, single config) │ +│ mlflow, jupyterhub │ +│ mosquitto (FROST stack only) │ +└───────────────┬────────────────────┘ │ WireGuard tunnels ┌───────┼────────┬───────────┐ ▼ ▼ ▼ ▼ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐ │ Edge: │ │ Edge: │ │ Edge: │ │ ... │ - │plant1 │ │plant2 │ │plant3 │ │ │ + │gemaal1│ │ ... │ │ ... │ │ │ └───┬───┘ └───────┘ └───────┘ └───────┘ │ TLS ▼ @@ -45,25 +47,27 @@ Each layer uses **four internal Docker networks**: | Network | Purpose | Notes | |---|---|---| | `edge` | Outermost. Cloud: internet-facing. Edge: plant-LAN-facing. | Only port-publishers join. | -| `app` | Application / automation tier (node-red, grafana, jenkins, gitea, …). | Default landing for app services. | +| `app` | Application / automation tier. | Default landing for app services. | | `data` | Databases (influxdb, sql). | `internal: true` — no internet egress. | | `mgmt` | Identity, control plane (portainer, keycloak admin, wireguard mgmt). | Restricted. | ### Cloud attachments ``` -edge : nginx-proxy, wireguard-server -app : nginx-proxy, mqtt, postfix, node-red, grafana, - jenkins, gitea, keycloak -data : influxdb, sql, grafana -mgmt : portainer, keycloak, wireguard-server +edge : nginx, wireguard-server +app : nginx, rabbitmq, postfix, node-red, grafana, + jenkins, gitea, keycloak, mlflow, jupyterhub +data : influxdb, sql, grafana, mlflow +mgmt : portainer, keycloak, wireguard-server, jupyterhub ``` +(`mosquitto` joins `app` only when the FROST stack is deployed.) + ### Edge attachments ``` -edge : nginx-proxy ← plant-LAN-facing -app : nginx-proxy, mqtt, postfix, node-red, +edge : nginx ← plant-LAN-facing +app : nginx, rabbitmq, postfix, node-red, grafana, keycloak, wireguard-client data : influxdb, grafana mgmt : portainer, keycloak, wireguard-client @@ -75,9 +79,9 @@ mgmt : portainer, keycloak, wireguard-client | Port | Container | Notes | |---|---|---| -| `tcp/80` | nginx-proxy | HTTP → 301 to 443 | -| `tcp/443` | nginx-proxy | All HTTPS UIs; TLS termination | -| `tcp/8883` | nginx-proxy | MQTT-TLS via `stream {}` block, SNI route to broker | +| `tcp/80` | nginx | HTTP → 301 to 443; also serves `/.well-known/acme-challenge/` for certbot | +| `tcp/443` | nginx | All HTTPS UIs; TLS termination | +| `tcp/8883` | nginx | MQTT-TLS via `stream {}` block; SNI route to `rabbitmq:1883` | | `udp/51820` | wireguard-server | VPN tunnel ingress | Two containers publish a total of four ports. **Everything else is invisible** from outside the host. @@ -86,34 +90,63 @@ Two containers publish a total of four ports. **Everything else is invisible** f | Port | Container | Bound to | |---|---|---| -| `tcp/80` | nginx-proxy | Plant-LAN interface only | -| `tcp/443` | nginx-proxy | Plant-LAN interface only | +| `tcp/80` | nginx | Plant-LAN interface only | +| `tcp/443` | nginx | Plant-LAN interface only | -The edge `wireguard-client` initiates outbound to the cloud — it publishes **no port**. On-site operators reach SCADA on the plant LAN; remote ops reach the same nginx via the WG tunnel. +The edge `wireguard-client` initiates outbound to the cloud — it publishes **no port**. + +## TLS strategy + +**Stock nginx + certbot sidecar** (Let's Encrypt, HTTP-01 webroot). + +- Stock `nginx:1.27-alpine` — required because we use the `stream {}` context for MQTT-TLS. `nginxproxy/nginx-proxy` (the jwilder image) is HTTP/HTTPS-only and can't expose stream cleanly. +- `certbot/certbot` sidecar runs `certbot renew` every 12h. Shared `nginx-certs` + `nginx-acme-challenge` volumes coordinate cert + challenge state between the two containers. +- Initial issuance is **manual** (one-time `docker compose run --rm certbot certonly --webroot …`). Renewal is automatic. + +For cloud-internal hostnames not reachable via Let's Encrypt HTTP-01, the longer-term plan is a small internal PKI (step-ca or similar) backed by `sql`. Out of scope for first deploy. ## Why segment - **Blast radius**: a compromised node-red on `app` cannot reach influxdb on `data` unless an explicit attachment is declared. Each service's reachability is auditable from `networks:` alone. -- **Defense in depth**: only nginx-proxy and wireguard-server bind host ports. No accidental `0.0.0.0` exposures. -- **NIS2 / utility audit**: WBD is in scope as water-sector. Compose networks are a cheap way to evidence segmentation at runtime and on paper. +- **Defense in depth**: only nginx and wireguard-server bind host ports. No accidental `0.0.0.0` exposures. +- **NIS2 / utility audit**: WBD is in scope as water-sector. Compose networks evidence segmentation at runtime and on paper. ## Special cases ### Postfix (cloud + edge) -The diagram labels it "OUT ONLY". Postfix initiates outbound SMTP to internet MX servers but accepts **no inbound** mail. So Postfix has zero ingress, no published port, no listener facing the internet. It just needs egress (which every container has via host NAT). +Postfix is **outbound-only**. It initiates SMTP to internet MX servers but accepts no inbound. Zero ingress, no published port, no listener facing internet. Just needs egress (every container has it via host NAT). -### MQTT (cloud) +### MQTT — two brokers -nginx-proxy `stream {}` block reverse-proxies `tcp/8883` to the internal broker via SNI. The broker has **no published port**. Cleanest "everything through nginx" model. +- **RabbitMQ** is the **general-purpose** broker. Runs at both cloud and edge. MQTT plugin enabled. Cloud-side reachable externally via nginx stream proxy on `tcp/8883`. Edge-side fully internal. +- **Mosquitto** is reserved for the **FROST (SensorThings API) stack** only — cloud-only. Internal to its own stack — no external ingress unless FROST publishers need to push from outside (in which case use a separate stream block on a different port). -### MQTT (edge) +If FROST needs cross-broker forwarding, add a RabbitMQ `shovel` plugin pointing at `mosquitto`. Not wired up by default. -Broker is **fully internal** to the edge stack — no plant-LAN ingress. Node-RED on edge bridges OPCUA → broker, then broker → cloud broker over the WG tunnel. Field devices that need MQTT publish to the cloud broker via WG, not to the edge broker directly. +### Gitea — HTTPS only + +No SSH ingress. `GITEA__server__DISABLE_SSH=true`. All clones over HTTPS via nginx-proxy. Re-evaluate only if Gitea Actions runners require SSH push. ### WireGuard server (cloud) -WireGuard is connectionless UDP with crypto-routed packets. It cannot be sensibly reverse-proxied (NAT/MTU break, no security benefit). The server publishes `udp/51820` directly — the **only** non-nginx public ingress on cloud. +WireGuard is connectionless UDP with crypto-routed packets. Proxying through nginx-stream breaks NAT/MTU and adds no security benefit. The server publishes `udp/51820` directly — the **only** non-nginx public ingress on cloud. + +## Stacks + +The repo defines **15 stacks** under `stacks/`: + +- **Cloud + edge**: `nginx-proxy`, `node-red`, `influxdb`, `grafana`, `keycloak`, `portainer`, `rabbitmq`, `postfix` +- **Cloud-only**: `wireguard-server`, `gitea` (HTTPS), `jenkins`, `sql` (postgres), `mlflow`, `jupyterhub`, `mosquitto` (FROST) +- **Edge-only**: `wireguard-client` + +## Sites + +| Site | Status | +|---|---| +| `gemaal1` | Scaffolded; awaiting hardware provisioning (PLANT_LAN_IP, WG peer key, OPCUA endpoint) | + +Additional plants follow the same pattern under `sites//`. ## Conventions @@ -126,10 +159,12 @@ WireGuard is connectionless UDP with crypto-routed packets. It cannot be sensibl ## Open decisions -These are deferred until we build the respective stack. Tracked here so we don't forget. +Tracked here so we don't forget. Each lands when we harden the relevant stack. -- **SQL flavor**: postgres / mariadb / mysql? Leaning postgres for the "single point of config" use case. -- **SSL strategy**: certbot inside nginx-proxy, acme-companion sidecar, or step-ca-driven internal PKI? Probably acme-companion against Let's Encrypt for external endpoints + internal PKI for service-to-service. -- **Keycloak storage**: bundled H2 (dev only) vs external SQL backend (probably the same `sql` stack). -- **Backup strategy** for `data` (influxdb, sql) and `mgmt` (gitea, jenkins workspaces). -- **First site**: which plant gets `sites//` scaffolded first? +- **MinIO / artifact store** — MLflow uses local volume for now; switch to S3-compatible MinIO sidecar when artifacts grow. +- **JupyterHub auth** — target Keycloak OIDC via `oauthenticator.generic.GenericOAuthenticator`. +- **WG client routing** — split-tunnel vs full; per-peer `AllowedIPs` policy. +- **MQTT cross-broker shovel** — only if FROST consumers must see RabbitMQ traffic or vice versa. +- **Internal PKI** — for cloud-internal hostnames not eligible for Let's Encrypt HTTP-01. +- **Backup strategy** — for `sql` (postgres), `influxdb`, `gitea-data`, `jenkins-home`, `mlflow-artifacts`. +- **Provision Gemaal1** — fill in `PLANT_LAN_IP`, WG peer key, OPCUA endpoint, deploy first stacks. diff --git a/sites/gemaal1/.env.example b/sites/gemaal1/.env.example new file mode 100644 index 0000000..131421e --- /dev/null +++ b/sites/gemaal1/.env.example @@ -0,0 +1,32 @@ +TZ=Europe/Amsterdam + +# Plant LAN interface IP that nginx-proxy binds to (replaces 0.0.0.0) +PLANT_LAN_IP= + +# WireGuard client peer config: see stacks/wireguard-client/config/wg0.conf + +# InfluxDB (local DB at this plant) +INFLUX_ADMIN_USER=admin +INFLUX_ADMIN_PASSWORD= +INFLUX_ADMIN_TOKEN= +INFLUX_ORG=wbd +INFLUX_BUCKET=gemaal1 + +# Grafana (local SCADA at this plant) +GRAFANA_ADMIN_USER=admin +GRAFANA_ADMIN_PASSWORD= +GRAFANA_ROOT_URL= + +# Keycloak (local realm) +KEYCLOAK_ADMIN=admin +KEYCLOAK_ADMIN_PASSWORD= +KEYCLOAK_HOSTNAME= + +# RabbitMQ (general broker, edge-local — internal only) +RABBITMQ_USER=admin +RABBITMQ_PASSWORD= +RABBITMQ_VHOST=gemaal1 + +# Postfix +POSTFIX_RELAYHOST= +POSTFIX_FROM_DOMAIN= diff --git a/sites/gemaal1/README.md b/sites/gemaal1/README.md new file mode 100644 index 0000000..cc8a464 --- /dev/null +++ b/sites/gemaal1/README.md @@ -0,0 +1,37 @@ +# gemaal1 + +Edge deployment for pumping station **Gemaal1** — first production site. + +## Hardware (fill in when provisioned) + +- Edge gateway model: ? +- Plant LAN subnet: ?.?.?.0/24 +- WAN: ? +- OT VLAN (PLC + OPCUA): ?.?.?.0/24 +- OPCUA endpoint: opc.tcp://? + +## What runs here + +nginx-proxy (plant-LAN-facing, certbot for TLS), wireguard-client (outbound tunnel to cloud), keycloak (local realm), portainer, influxdb (local DB), grafana (local SCADA), node-red, rabbitmq (general broker, internal only), postfix. + +## Run + +```bash +cp .env.example .env # fill in real secrets + PLANT_LAN_IP +docker compose up -d +docker compose ps +``` + +## Ingress + +| Port | Bound to | +|---|---| +| tcp/80, 443 | `${PLANT_LAN_IP}` only | + +Remote ops reach the same nginx via the WireGuard tunnel from cloud (no extra port published). + +## OT uplink + +Node-RED + EVOLV nodes talk to the OPCUA server on the OT VLAN. The edge gateway must have a NIC on that VLAN. OPCUA + PLC are **managed outside this repo**. + +See [`../../docs/architecture.md`](../../docs/architecture.md) for the full topology. diff --git a/sites/gemaal1/compose.yml b/sites/gemaal1/compose.yml new file mode 100644 index 0000000..8e288e5 --- /dev/null +++ b/sites/gemaal1/compose.yml @@ -0,0 +1,40 @@ +# Edge deployment for pumping station "Gemaal1". +# First production site; template for additional plants. +# Run: cp .env.example .env && docker compose up -d + +name: edge-gemaal1 + +# Uncomment as each stack is hardened beyond stub. +include: + # - ../../stacks/nginx-proxy/compose.yml + # - ../../stacks/wireguard-client/compose.yml + # - ../../stacks/keycloak/compose.yml + # - ../../stacks/portainer/compose.yml + # - ../../stacks/influxdb/compose.yml + # - ../../stacks/grafana/compose.yml + # - ../../stacks/node-red/compose.yml + # - ../../stacks/rabbitmq/compose.yml + # - ../../stacks/postfix/compose.yml + +# Site-specific overrides — bind nginx to the plant-LAN interface only. +# Uncomment once nginx-proxy is wired up and PLANT_LAN_IP is set in .env. +# services: +# nginx: +# ports: +# - "${PLANT_LAN_IP}:80:80" +# - "${PLANT_LAN_IP}:443:443" + +networks: + edge: + name: edge-gemaal1-edge + driver: bridge + app: + name: edge-gemaal1-app + driver: bridge + data: + name: edge-gemaal1-data + driver: bridge + internal: true + mgmt: + name: edge-gemaal1-mgmt + driver: bridge diff --git a/stacks/gitea/.env.example b/stacks/gitea/.env.example index e32641c..ee59d02 100644 --- a/stacks/gitea/.env.example +++ b/stacks/gitea/.env.example @@ -1,5 +1,4 @@ GITEA_ROOT_URL= -GITEA_DB_TYPE=postgres GITEA_DB_HOST=sql:5432 GITEA_DB_NAME=gitea GITEA_DB_USER=gitea diff --git a/stacks/gitea/README.md b/stacks/gitea/README.md index 2e73572..322da66 100644 --- a/stacks/gitea/README.md +++ b/stacks/gitea/README.md @@ -1,7 +1,8 @@ # gitea -Self-hosted git server. **Cloud-only stack.** (External repos live at `gitea.wbd-rd.nl`; this is for R&D internal use.) +Self-hosted git server. **Cloud-only.** (External repos live at `gitea.wbd-rd.nl`; this is for R&D internal use.) -- **Networks**: `app` (UI) + `data` (DB backend in `sql` stack) +- **Networks**: `app` (UI) + `data` (postgres backend in `sql` stack) - **Volume**: `gitea-data` (repos + LFS + actions runners) -- **TODO**: SSH access strategy (nginx stream proxy port 22, or skip and use HTTPS-only), Keycloak OIDC, runners for Gitea Actions +- **SSH ingress**: **disabled** (`DISABLE_SSH=true`). All clones over HTTPS via nginx-proxy. Re-evaluate only if Gitea Actions runners need SSH push semantics. +- **TODO**: Keycloak OIDC, Gitea Actions runners, LFS storage policy diff --git a/stacks/gitea/compose.yml b/stacks/gitea/compose.yml index 7b72b65..bdb8fff 100644 --- a/stacks/gitea/compose.yml +++ b/stacks/gitea/compose.yml @@ -1,5 +1,6 @@ # gitea — git server (cloud only) # Networks: app + data (uses sql stack as DB backend) +# Ingress: HTTPS-only via nginx-proxy. No SSH port published. services: gitea: @@ -10,7 +11,8 @@ services: USER_UID: "1000" USER_GID: "1000" GITEA__server__ROOT_URL: ${GITEA_ROOT_URL} - GITEA__database__DB_TYPE: ${GITEA_DB_TYPE:-postgres} + GITEA__server__DISABLE_SSH: "true" + GITEA__database__DB_TYPE: postgres GITEA__database__HOST: ${GITEA_DB_HOST:-sql:5432} GITEA__database__NAME: ${GITEA_DB_NAME:-gitea} GITEA__database__USER: ${GITEA_DB_USER} @@ -18,7 +20,6 @@ services: TZ: ${TZ:-Europe/Amsterdam} volumes: - gitea-data:/data - # SSH port: TODO — decide if Gitea SSH (22/2222) is exposed via nginx stream or skipped networks: app: diff --git a/stacks/jupyterhub/.env.example b/stacks/jupyterhub/.env.example new file mode 100644 index 0000000..2244876 --- /dev/null +++ b/stacks/jupyterhub/.env.example @@ -0,0 +1,5 @@ +# Image spawned per user (override to a custom EVOLV image with mlflow client preinstalled) +JUPYTER_NOTEBOOK_IMAGE=jupyter/datascience-notebook:latest + +# Admin users (comma-separated, consumed by jupyterhub_config.py) +JUPYTERHUB_ADMIN_USERS= diff --git a/stacks/jupyterhub/README.md b/stacks/jupyterhub/README.md new file mode 100644 index 0000000..9a42f95 --- /dev/null +++ b/stacks/jupyterhub/README.md @@ -0,0 +1,16 @@ +# jupyterhub + +Multi-user JupyterHub. Each authenticated user gets their own notebook container via DockerSpawner. **Cloud-only.** + +- **Networks**: `app` (UI proxied at `/jupyter` or subdomain) + `mgmt` (Docker socket so JupyterHub can spawn user containers) +- **Spawned user containers** land on the `cloud-app` network so they can reach mlflow, influxdb (via grafana proxy), rabbitmq +- **Config**: `config/jupyterhub_config.py` — DockerSpawner setup, authenticator, admin list, resource limits + +## TODO + +- DockerSpawner config (image, network, user volumes, idle culling) +- Keycloak OAuth via `oauthenticator.generic.GenericOAuthenticator` +- Build a project-specific notebook image with EVOLV libs + mlflow client + InfluxDB client preinstalled +- Per-user persistent volume mounted at `/home/jovyan/work` +- CPU / memory limits per user container +- Cull idle servers (`c.JupyterHub.services` cull-idle pattern) diff --git a/stacks/jupyterhub/compose.yml b/stacks/jupyterhub/compose.yml new file mode 100644 index 0000000..4c1356b --- /dev/null +++ b/stacks/jupyterhub/compose.yml @@ -0,0 +1,25 @@ +# jupyterhub — multi-user notebook server (cloud only) +# Networks: app (UI proxied by nginx) + mgmt (Docker socket for DockerSpawner) + +services: + jupyterhub: + image: jupyterhub/jupyterhub:5 + restart: unless-stopped + networks: [app, mgmt] + volumes: + - jupyterhub-data:/srv/jupyterhub + - /var/run/docker.sock:/var/run/docker.sock + - ./config/jupyterhub_config.py:/srv/jupyterhub/jupyterhub_config.py:ro + environment: + TZ: ${TZ:-Europe/Amsterdam} + DOCKER_NOTEBOOK_IMAGE: ${JUPYTER_NOTEBOOK_IMAGE:-jupyter/datascience-notebook:latest} + DOCKER_NETWORK_NAME: cloud-app + # TODO: DockerSpawner config in jupyterhub_config.py; Keycloak OAuthAuthenticator; + # preinstalled libraries; per-user persistent volumes; CPU/memory limits + +networks: + app: + mgmt: + +volumes: + jupyterhub-data: diff --git a/stacks/mlflow/.env.example b/stacks/mlflow/.env.example new file mode 100644 index 0000000..c67a87d --- /dev/null +++ b/stacks/mlflow/.env.example @@ -0,0 +1,3 @@ +MLFLOW_DB_NAME=mlflow +MLFLOW_DB_USER=mlflow +MLFLOW_DB_PASSWORD= diff --git a/stacks/mlflow/README.md b/stacks/mlflow/README.md new file mode 100644 index 0000000..deb4e4d --- /dev/null +++ b/stacks/mlflow/README.md @@ -0,0 +1,12 @@ +# mlflow + +MLflow tracking server + model registry. Used by data scientists running experiments from JupyterHub or local laptops. **Cloud-only.** + +- **Networks**: `app` (UI on port 5000, reverse-proxied at `/mlflow` or subdomain) + `data` (postgres backend in `sql`) +- **Backend store**: postgres database `mlflow` — must be provisioned by `sql/config/init.d/` +- **Artifact store**: local volume `mlflow-artifacts`. Switch to S3/MinIO when artifact volume grows beyond a few GB. +- **TODO**: + - Provision `mlflow` DB + role in `sql` init scripts + - Keycloak OIDC via nginx `auth_request` (MLflow has no native auth — must front-end it) + - MinIO sidecar for S3-compatible artifact store + - Retention / cleanup policy for stale runs diff --git a/stacks/mlflow/compose.yml b/stacks/mlflow/compose.yml new file mode 100644 index 0000000..9dc0479 --- /dev/null +++ b/stacks/mlflow/compose.yml @@ -0,0 +1,27 @@ +# mlflow — experiment tracking + model registry (cloud only) +# Networks: app (UI on 5000, proxied by nginx) + data (postgres backend on sql stack) + +services: + mlflow: + image: ghcr.io/mlflow/mlflow:v2.18.0 + restart: unless-stopped + networks: [app, data] + command: > + mlflow server + --host 0.0.0.0 + --port 5000 + --backend-store-uri postgresql://${MLFLOW_DB_USER}:${MLFLOW_DB_PASSWORD}@sql:5432/${MLFLOW_DB_NAME} + --default-artifact-root /mlflow/artifacts + --serve-artifacts + volumes: + - mlflow-artifacts:/mlflow/artifacts + environment: + TZ: ${TZ:-Europe/Amsterdam} + # TODO: switch artifact store to S3/MinIO; Keycloak OIDC via nginx auth_request + +networks: + app: + data: + +volumes: + mlflow-artifacts: diff --git a/stacks/mosquitto/.env.example b/stacks/mosquitto/.env.example new file mode 100644 index 0000000..adf75ef --- /dev/null +++ b/stacks/mosquitto/.env.example @@ -0,0 +1 @@ +# mosquitto — broker uses config file, no env vars in stub diff --git a/stacks/mosquitto/README.md b/stacks/mosquitto/README.md new file mode 100644 index 0000000..4e1aab6 --- /dev/null +++ b/stacks/mosquitto/README.md @@ -0,0 +1,11 @@ +# mosquitto + +Eclipse Mosquitto MQTT broker. **Reserved for the FROST (SensorThings API) stack** — separate from the general-purpose `rabbitmq` broker. **Cloud-only.** + +- **Network**: `app` (internal only — FROST services connect via service name `mosquitto`) +- **No external ingress** by default. If FROST needs external MQTT publishers, route them through a separate nginx stream block on a different port (not 8883 — that belongs to rabbitmq). +- **Config**: `config/mosquitto.conf` — listener config, ACLs, persistence +- **TODO**: + - ACL aligned with FROST topic structure + - Persistence retention policy + - Optional shovel from `rabbitmq` if cross-broker forwarding is needed diff --git a/stacks/mosquitto/compose.yml b/stacks/mosquitto/compose.yml new file mode 100644 index 0000000..f7ee8b4 --- /dev/null +++ b/stacks/mosquitto/compose.yml @@ -0,0 +1,22 @@ +# mosquitto — MQTT broker reserved for the FROST (SensorThings) stack +# Cloud-only. Internal to its own stack; no external ingress by default. +# Networks: app + +services: + mosquitto: + image: eclipse-mosquitto:2.0 + restart: unless-stopped + networks: [app] + volumes: + - ./config/mosquitto.conf:/mosquitto/config/mosquitto.conf:ro + - mosquitto-data:/mosquitto/data + - mosquitto-log:/mosquitto/log + # No 'ports:' — FROST is the only intended consumer. If external MQTT + # access for FROST is needed later, add a separate nginx stream block. + +networks: + app: + +volumes: + mosquitto-data: + mosquitto-log: diff --git a/stacks/mqtt/.env.example b/stacks/mqtt/.env.example deleted file mode 100644 index 6f40ef4..0000000 --- a/stacks/mqtt/.env.example +++ /dev/null @@ -1,2 +0,0 @@ -# mqtt — broker uses config file, not env -# GUI vars land here once a GUI image is chosen diff --git a/stacks/mqtt/README.md b/stacks/mqtt/README.md deleted file mode 100644 index 96ff820..0000000 --- a/stacks/mqtt/README.md +++ /dev/null @@ -1,8 +0,0 @@ -# mqtt - -Eclipse Mosquitto broker. Cloud-side accepts external connections via nginx stream proxy. Edge-side is fully internal. - -- **Network**: `app` (no published port — nginx-proxy fronts external traffic on cloud) -- **Edge note**: on edge stacks, broker stays purely internal; node-red bridges OPCUA → broker → cloud broker over WG -- **Config**: `config/mosquitto.conf` — listener config, ACLs, persistence -- **TODO**: ACL policy, bridge config to cloud broker, GUI choice (mqtt-explorer / hivemq web / custom) diff --git a/stacks/mqtt/compose.yml b/stacks/mqtt/compose.yml deleted file mode 100644 index 063e36f..0000000 --- a/stacks/mqtt/compose.yml +++ /dev/null @@ -1,24 +0,0 @@ -# mqtt — MQTT broker (Eclipse Mosquitto) + optional GUI -# Networks: app (no port published — reached via nginx stream proxy on 8883) - -services: - mqtt-broker: - image: eclipse-mosquitto:2.0 - restart: unless-stopped - networks: [app] - volumes: - - ./config/mosquitto.conf:/mosquitto/config/mosquitto.conf:ro - - mqtt-data:/mosquitto/data - - mqtt-log:/mosquitto/log - # No 'ports:' — nginx-proxy stream-proxies external 8883 to internal 1883/8883 - - # mqtt-gui: # TODO: choose a GUI image (mqtt-explorer? hivemq web client? custom) - # image: ... - # networks: [app] - -networks: - app: - -volumes: - mqtt-data: - mqtt-log: diff --git a/stacks/nginx-proxy/.env.example b/stacks/nginx-proxy/.env.example index 6ea62c6..a7928ef 100644 --- a/stacks/nginx-proxy/.env.example +++ b/stacks/nginx-proxy/.env.example @@ -1,2 +1,4 @@ -# nginx-proxy — config-file-driven, no env vars in stub -# Domain + cert settings will land here once SSL strategy is chosen +LETSENCRYPT_EMAIL= +# Production CA: https://acme-v02.api.letsencrypt.org/directory +# Staging CA (testing): https://acme-staging-v02.api.letsencrypt.org/directory +ACME_CA_URI=https://acme-v02.api.letsencrypt.org/directory diff --git a/stacks/nginx-proxy/README.md b/stacks/nginx-proxy/README.md index 2a00b15..a441e42 100644 --- a/stacks/nginx-proxy/README.md +++ b/stacks/nginx-proxy/README.md @@ -1,12 +1,47 @@ # nginx-proxy -The single web ingress. Reverse-proxies HTTPS UIs and stream-proxies MQTT-TLS. +The single web ingress for cloud + edge. Reverse-proxies HTTPS UIs and stream-proxies MQTT-TLS to RabbitMQ. TLS certificates managed by a certbot sidecar (Let's Encrypt, HTTP-01 webroot challenge). +- **Image**: stock `nginx:1.27-alpine` (we don't use `nginxproxy/nginx-proxy` because we need the `stream {}` context for MQTT-TLS, which that image doesn't expose cleanly) +- **Sidecar**: `certbot/certbot:latest` — renews every 12h, shared `nginx-certs` + `nginx-acme-challenge` volumes - **Networks**: `edge` (the only port-publisher) + `app` (talks to upstream services) - **Host ports**: `tcp/80`, `tcp/443`, `tcp/8883` -- **Config**: - - `config/nginx.conf` — base - - `config/conf.d/*.conf` — HTTP vhosts (one per upstream UI) - - `config/stream.d/mqtt.conf` — MQTT-TLS stream block, SNI route to mqtt broker - - `config/certs/` — TLS certs (volume-mounted from cert manager) -- **TODO**: pick SSL strategy (acme-companion sidecar vs certbot vs internal PKI), write vhost templates per upstream + +## Config layout + +``` +config/ +├── nginx.conf # base config — must include `stream {}` directive +├── conf.d/ # HTTP vhosts (one per upstream UI) +│ ├── grafana.conf +│ ├── node-red.conf +│ ├── gitea.conf +│ └── ... +└── stream.d/ + └── mqtt.conf # MQTT-TLS stream block, SNI route to rabbitmq:1883 +``` + +Volumes: +- `nginx-certs` — Let's Encrypt cert chains (`/etc/letsencrypt`), read-only mounted into nginx, writable from certbot +- `nginx-acme-challenge` — webroot for HTTP-01 challenges (`/var/www/certbot`) + +## Initial cert issuance + +1. Start with HTTP-only nginx config (serving `/.well-known/acme-challenge/`). +2. Issue: + ```bash + docker compose run --rm certbot certonly \ + --webroot -w /var/www/certbot \ + --email "$LETSENCRYPT_EMAIL" --agree-tos --no-eff-email \ + -d gitea.example.com -d grafana.example.com -d nodered.example.com + ``` +3. Drop HTTPS vhost configs into `config/conf.d/` and reload nginx. + +The sidecar then renews automatically. + +## TODO + +- Write base `config/nginx.conf` (`http` + `stream` contexts) +- Per-upstream vhost templates with OIDC `auth_request` to Keycloak +- Decide internal PKI vs Let's Encrypt for cloud-internal hostnames not reachable from the public internet +- Edge-side variant: bind to plant-LAN IP only, internal CA for plant.local hostnames diff --git a/stacks/nginx-proxy/compose.yml b/stacks/nginx-proxy/compose.yml index 671c29b..7e92bc1 100644 --- a/stacks/nginx-proxy/compose.yml +++ b/stacks/nginx-proxy/compose.yml @@ -1,22 +1,44 @@ -# nginx-proxy — TLS reverse proxy (HTTPS + MQTT-TLS) +# nginx-proxy — TLS reverse proxy (HTTPS + MQTT-TLS stream proxy) +# Stock nginx + certbot sidecar for Let's Encrypt automation. # Networks: edge (port publisher) + app (proxy targets) # Publishes: 80, 443, 8883 on the host services: - nginx-proxy: + nginx: image: nginx:1.27-alpine restart: unless-stopped networks: [edge, app] ports: - "80:80" - "443:443" - - "8883:8883" # MQTT-TLS via stream{} block + - "8883:8883" # MQTT-TLS via stream{} block, SNI route to rabbitmq volumes: + - ./config/nginx.conf:/etc/nginx/nginx.conf:ro - ./config/conf.d:/etc/nginx/conf.d:ro - ./config/stream.d:/etc/nginx/stream.d:ro - - ./config/nginx.conf:/etc/nginx/nginx.conf:ro - - nginx-certs:/etc/nginx/certs:ro - # TODO: SSL strategy (acme-companion sidecar vs certbot vs internal PKI) + - nginx-certs:/etc/letsencrypt:ro + - nginx-acme-challenge:/var/www/certbot:ro + depends_on: + - certbot + + certbot: + image: certbot/certbot:latest + restart: unless-stopped + volumes: + - nginx-certs:/etc/letsencrypt + - nginx-acme-challenge:/var/www/certbot + entrypoint: /bin/sh -c + command: > + "trap exit TERM; + while :; do + certbot renew --webroot -w /var/www/certbot --quiet; + sleep 12h & wait $${!}; + done" + # Initial issuance is manual: + # docker compose run --rm certbot certonly \ + # --webroot -w /var/www/certbot \ + # --email "$LETSENCRYPT_EMAIL" --agree-tos --no-eff-email \ + # -d -d ... networks: edge: @@ -24,3 +46,4 @@ networks: volumes: nginx-certs: + nginx-acme-challenge: diff --git a/stacks/rabbitmq/.env.example b/stacks/rabbitmq/.env.example new file mode 100644 index 0000000..2b3d2e0 --- /dev/null +++ b/stacks/rabbitmq/.env.example @@ -0,0 +1,3 @@ +RABBITMQ_USER=admin +RABBITMQ_PASSWORD= +RABBITMQ_VHOST=/ diff --git a/stacks/rabbitmq/README.md b/stacks/rabbitmq/README.md new file mode 100644 index 0000000..8625583 --- /dev/null +++ b/stacks/rabbitmq/README.md @@ -0,0 +1,17 @@ +# rabbitmq + +General-purpose message broker. AMQP for app-to-app messaging; MQTT plugin for external MQTT-TLS clients (fronted by nginx-proxy stream block). + +Used at both cloud and edge. The `mosquitto` stack is reserved for the FROST SensorThings stack only — do not confuse the two. + +- **Network**: `app` — no published port +- **External MQTT** clients reach `nginx-proxy:8883` (cloud) which stream-proxies to `rabbitmq:1883` internally. Edge brokers are internal-only. +- **Management UI**: port 15672 → reverse-proxied through nginx-proxy +- **Plugins to enable** in `config/enabled_plugins`: + ``` + [rabbitmq_management, rabbitmq_mqtt, rabbitmq_web_mqtt]. + ``` +- **TODO**: + - ACL policies + - VHost layout (cloud-bus, plant-buses, ml-bus) + - Shovel/federation to `mosquitto` if FROST needs MQTT bridge to general broker diff --git a/stacks/rabbitmq/compose.yml b/stacks/rabbitmq/compose.yml new file mode 100644 index 0000000..cadc7dc --- /dev/null +++ b/stacks/rabbitmq/compose.yml @@ -0,0 +1,26 @@ +# rabbitmq — general-purpose message broker (AMQP + MQTT plugin) +# Used at both cloud and edge. Cloud-side: external clients reach it via +# nginx stream proxy on tcp/8883. Edge-side: internal only. +# Networks: app + +services: + rabbitmq: + image: rabbitmq:3.13-management + restart: unless-stopped + networks: [app] + environment: + RABBITMQ_DEFAULT_USER: ${RABBITMQ_USER} + RABBITMQ_DEFAULT_PASS: ${RABBITMQ_PASSWORD} + RABBITMQ_DEFAULT_VHOST: ${RABBITMQ_VHOST:-/} + TZ: ${TZ:-Europe/Amsterdam} + volumes: + - rabbitmq-data:/var/lib/rabbitmq + - ./config/enabled_plugins:/etc/rabbitmq/enabled_plugins:ro + - ./config/rabbitmq.conf:/etc/rabbitmq/rabbitmq.conf:ro + # No 'ports:' — nginx-proxy stream-proxies external 8883 to internal 1883 + +networks: + app: + +volumes: + rabbitmq-data: diff --git a/stacks/sql/README.md b/stacks/sql/README.md index 3d23b6a..edfe811 100644 --- a/stacks/sql/README.md +++ b/stacks/sql/README.md @@ -1,11 +1,9 @@ # sql -Central configuration database — the "single point of config" backing Keycloak, Gitea, and other stacks that need a relational store. **Cloud-only stack.** +Central configuration database — the "single point of config" backing Keycloak, Gitea, MLflow, and any stack that needs a relational store. **Cloud-only.** +- **Engine**: postgres 16-alpine - **Network**: `data` only (no internet egress) -- **Engine**: stub uses **postgres:16-alpine** pending decision (postgres vs mariadb vs mysql) -- **Volume**: `sql-data` -- **TODO**: - - confirm engine choice (likely postgres for keycloak + gitea compatibility) - - per-app database/role provisioning (init scripts in `config/init.d/`) - - backup strategy (pg_dump cron sidecar vs streaming replica) +- **Volume**: `sql-data` (PGDATA) +- **Init scripts**: `config/init.d/*.sql` runs on first start — provisions per-app databases/roles (gitea, keycloak, mlflow, …) +- **TODO**: backup strategy (pg_dump cron sidecar vs streaming replica) diff --git a/stacks/sql/compose.yml b/stacks/sql/compose.yml index a1824f1..b03ad10 100644 --- a/stacks/sql/compose.yml +++ b/stacks/sql/compose.yml @@ -1,6 +1,5 @@ -# sql — single point of config DB (cloud only) +# sql — central config DB (postgres, cloud only) # Networks: data (no internet egress) -# TBD: postgres / mariadb / mysql — stub uses postgres pending decision services: sql: @@ -14,6 +13,7 @@ services: TZ: ${TZ:-Europe/Amsterdam} volumes: - sql-data:/var/lib/postgresql/data + - ./config/init.d:/docker-entrypoint-initdb.d:ro networks: data: