feat: SQL=postgres, nginx+certbot, MQTT split, ML stacks, gitea HTTPS-only, gemaal1 site

Round-2 changes locking in scaffold-phase decisions and adding ML/notebook stacks.

Locked decisions
- sql: postgres 16-alpine (was TBD); init.d/ mount for per-app DB provisioning
- nginx-proxy: stock nginx + certbot sidecar (was nginx:alpine TODO).
  Chose stock over nginxproxy/nginx-proxy because stream{} is required for
  MQTT-TLS reverse-proxy on tcp/8883 to rabbitmq:1883.
- gitea: HTTPS-only (DISABLE_SSH=true). No SSH port published.

MQTT split
- Remove stacks/mqtt placeholder.
- Add stacks/rabbitmq — general-purpose broker (AMQP + MQTT plugin),
  used at both cloud and edge. External MQTT clients reach cloud broker
  via nginx stream-proxy on 8883.
- Add stacks/mosquitto — reserved for the FROST (SensorThings) stack
  only. Cloud-only. Internal to its own stack; no external ingress.

ML / notebooks (cloud-only)
- stacks/mlflow — experiment tracking + model registry. Postgres backend
  on sql stack; local volume for artifacts (S3/MinIO is a TODO).
- stacks/jupyterhub — multi-user notebook server. DockerSpawner via
  mounted docker.sock; users spawn into cloud-app network so they can
  reach mlflow, influxdb (via grafana), rabbitmq.

Sites
- sites/gemaal1 — first edge deployment scaffold. Site-local override
  template for binding nginx to PLANT_LAN_IP.

Docs
- README + docs/architecture.md updated: stacks table now lists 15 stacks,
  ingress + attachment tables reflect mlflow/jupyterhub, TLS strategy
  section locked, MQTT-split section added, Gitea HTTPS-only noted.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
znetsixe
2026-05-21 13:22:46 +02:00
parent 8ab9061983
commit 2f5e3b4183
30 changed files with 492 additions and 116 deletions

View File

@@ -1,5 +1,4 @@
GITEA_ROOT_URL=
GITEA_DB_TYPE=postgres
GITEA_DB_HOST=sql:5432
GITEA_DB_NAME=gitea
GITEA_DB_USER=gitea

View File

@@ -1,7 +1,8 @@
# gitea
Self-hosted git server. **Cloud-only stack.** (External repos live at `gitea.wbd-rd.nl`; this is for R&D internal use.)
Self-hosted git server. **Cloud-only.** (External repos live at `gitea.wbd-rd.nl`; this is for R&D internal use.)
- **Networks**: `app` (UI) + `data` (DB backend in `sql` stack)
- **Networks**: `app` (UI) + `data` (postgres backend in `sql` stack)
- **Volume**: `gitea-data` (repos + LFS + actions runners)
- **TODO**: SSH access strategy (nginx stream proxy port 22, or skip and use HTTPS-only), Keycloak OIDC, runners for Gitea Actions
- **SSH ingress**: **disabled** (`DISABLE_SSH=true`). All clones over HTTPS via nginx-proxy. Re-evaluate only if Gitea Actions runners need SSH push semantics.
- **TODO**: Keycloak OIDC, Gitea Actions runners, LFS storage policy

View File

@@ -1,5 +1,6 @@
# gitea — git server (cloud only)
# Networks: app + data (uses sql stack as DB backend)
# Ingress: HTTPS-only via nginx-proxy. No SSH port published.
services:
gitea:
@@ -10,7 +11,8 @@ services:
USER_UID: "1000"
USER_GID: "1000"
GITEA__server__ROOT_URL: ${GITEA_ROOT_URL}
GITEA__database__DB_TYPE: ${GITEA_DB_TYPE:-postgres}
GITEA__server__DISABLE_SSH: "true"
GITEA__database__DB_TYPE: postgres
GITEA__database__HOST: ${GITEA_DB_HOST:-sql:5432}
GITEA__database__NAME: ${GITEA_DB_NAME:-gitea}
GITEA__database__USER: ${GITEA_DB_USER}
@@ -18,7 +20,6 @@ services:
TZ: ${TZ:-Europe/Amsterdam}
volumes:
- gitea-data:/data
# SSH port: TODO — decide if Gitea SSH (22/2222) is exposed via nginx stream or skipped
networks:
app:

View File

@@ -0,0 +1,5 @@
# Image spawned per user (override to a custom EVOLV image with mlflow client preinstalled)
JUPYTER_NOTEBOOK_IMAGE=jupyter/datascience-notebook:latest
# Admin users (comma-separated, consumed by jupyterhub_config.py)
JUPYTERHUB_ADMIN_USERS=

View File

@@ -0,0 +1,16 @@
# jupyterhub
Multi-user JupyterHub. Each authenticated user gets their own notebook container via DockerSpawner. **Cloud-only.**
- **Networks**: `app` (UI proxied at `/jupyter` or subdomain) + `mgmt` (Docker socket so JupyterHub can spawn user containers)
- **Spawned user containers** land on the `cloud-app` network so they can reach mlflow, influxdb (via grafana proxy), rabbitmq
- **Config**: `config/jupyterhub_config.py` — DockerSpawner setup, authenticator, admin list, resource limits
## TODO
- DockerSpawner config (image, network, user volumes, idle culling)
- Keycloak OAuth via `oauthenticator.generic.GenericOAuthenticator`
- Build a project-specific notebook image with EVOLV libs + mlflow client + InfluxDB client preinstalled
- Per-user persistent volume mounted at `/home/jovyan/work`
- CPU / memory limits per user container
- Cull idle servers (`c.JupyterHub.services` cull-idle pattern)

View File

@@ -0,0 +1,25 @@
# jupyterhub — multi-user notebook server (cloud only)
# Networks: app (UI proxied by nginx) + mgmt (Docker socket for DockerSpawner)
services:
jupyterhub:
image: jupyterhub/jupyterhub:5
restart: unless-stopped
networks: [app, mgmt]
volumes:
- jupyterhub-data:/srv/jupyterhub
- /var/run/docker.sock:/var/run/docker.sock
- ./config/jupyterhub_config.py:/srv/jupyterhub/jupyterhub_config.py:ro
environment:
TZ: ${TZ:-Europe/Amsterdam}
DOCKER_NOTEBOOK_IMAGE: ${JUPYTER_NOTEBOOK_IMAGE:-jupyter/datascience-notebook:latest}
DOCKER_NETWORK_NAME: cloud-app
# TODO: DockerSpawner config in jupyterhub_config.py; Keycloak OAuthAuthenticator;
# preinstalled libraries; per-user persistent volumes; CPU/memory limits
networks:
app:
mgmt:
volumes:
jupyterhub-data:

View File

@@ -0,0 +1,3 @@
MLFLOW_DB_NAME=mlflow
MLFLOW_DB_USER=mlflow
MLFLOW_DB_PASSWORD=

12
stacks/mlflow/README.md Normal file
View File

@@ -0,0 +1,12 @@
# mlflow
MLflow tracking server + model registry. Used by data scientists running experiments from JupyterHub or local laptops. **Cloud-only.**
- **Networks**: `app` (UI on port 5000, reverse-proxied at `/mlflow` or subdomain) + `data` (postgres backend in `sql`)
- **Backend store**: postgres database `mlflow` — must be provisioned by `sql/config/init.d/`
- **Artifact store**: local volume `mlflow-artifacts`. Switch to S3/MinIO when artifact volume grows beyond a few GB.
- **TODO**:
- Provision `mlflow` DB + role in `sql` init scripts
- Keycloak OIDC via nginx `auth_request` (MLflow has no native auth — must front-end it)
- MinIO sidecar for S3-compatible artifact store
- Retention / cleanup policy for stale runs

27
stacks/mlflow/compose.yml Normal file
View File

@@ -0,0 +1,27 @@
# mlflow — experiment tracking + model registry (cloud only)
# Networks: app (UI on 5000, proxied by nginx) + data (postgres backend on sql stack)
services:
mlflow:
image: ghcr.io/mlflow/mlflow:v2.18.0
restart: unless-stopped
networks: [app, data]
command: >
mlflow server
--host 0.0.0.0
--port 5000
--backend-store-uri postgresql://${MLFLOW_DB_USER}:${MLFLOW_DB_PASSWORD}@sql:5432/${MLFLOW_DB_NAME}
--default-artifact-root /mlflow/artifacts
--serve-artifacts
volumes:
- mlflow-artifacts:/mlflow/artifacts
environment:
TZ: ${TZ:-Europe/Amsterdam}
# TODO: switch artifact store to S3/MinIO; Keycloak OIDC via nginx auth_request
networks:
app:
data:
volumes:
mlflow-artifacts:

View File

@@ -0,0 +1 @@
# mosquitto — broker uses config file, no env vars in stub

View File

@@ -0,0 +1,11 @@
# mosquitto
Eclipse Mosquitto MQTT broker. **Reserved for the FROST (SensorThings API) stack** — separate from the general-purpose `rabbitmq` broker. **Cloud-only.**
- **Network**: `app` (internal only — FROST services connect via service name `mosquitto`)
- **No external ingress** by default. If FROST needs external MQTT publishers, route them through a separate nginx stream block on a different port (not 8883 — that belongs to rabbitmq).
- **Config**: `config/mosquitto.conf` — listener config, ACLs, persistence
- **TODO**:
- ACL aligned with FROST topic structure
- Persistence retention policy
- Optional shovel from `rabbitmq` if cross-broker forwarding is needed

View File

@@ -0,0 +1,22 @@
# mosquitto — MQTT broker reserved for the FROST (SensorThings) stack
# Cloud-only. Internal to its own stack; no external ingress by default.
# Networks: app
services:
mosquitto:
image: eclipse-mosquitto:2.0
restart: unless-stopped
networks: [app]
volumes:
- ./config/mosquitto.conf:/mosquitto/config/mosquitto.conf:ro
- mosquitto-data:/mosquitto/data
- mosquitto-log:/mosquitto/log
# No 'ports:' — FROST is the only intended consumer. If external MQTT
# access for FROST is needed later, add a separate nginx stream block.
networks:
app:
volumes:
mosquitto-data:
mosquitto-log:

View File

@@ -1,2 +0,0 @@
# mqtt — broker uses config file, not env
# GUI vars land here once a GUI image is chosen

View File

@@ -1,8 +0,0 @@
# mqtt
Eclipse Mosquitto broker. Cloud-side accepts external connections via nginx stream proxy. Edge-side is fully internal.
- **Network**: `app` (no published port — nginx-proxy fronts external traffic on cloud)
- **Edge note**: on edge stacks, broker stays purely internal; node-red bridges OPCUA → broker → cloud broker over WG
- **Config**: `config/mosquitto.conf` — listener config, ACLs, persistence
- **TODO**: ACL policy, bridge config to cloud broker, GUI choice (mqtt-explorer / hivemq web / custom)

View File

@@ -1,24 +0,0 @@
# mqtt — MQTT broker (Eclipse Mosquitto) + optional GUI
# Networks: app (no port published — reached via nginx stream proxy on 8883)
services:
mqtt-broker:
image: eclipse-mosquitto:2.0
restart: unless-stopped
networks: [app]
volumes:
- ./config/mosquitto.conf:/mosquitto/config/mosquitto.conf:ro
- mqtt-data:/mosquitto/data
- mqtt-log:/mosquitto/log
# No 'ports:' — nginx-proxy stream-proxies external 8883 to internal 1883/8883
# mqtt-gui: # TODO: choose a GUI image (mqtt-explorer? hivemq web client? custom)
# image: ...
# networks: [app]
networks:
app:
volumes:
mqtt-data:
mqtt-log:

View File

@@ -1,2 +1,4 @@
# nginx-proxy — config-file-driven, no env vars in stub
# Domain + cert settings will land here once SSL strategy is chosen
LETSENCRYPT_EMAIL=
# Production CA: https://acme-v02.api.letsencrypt.org/directory
# Staging CA (testing): https://acme-staging-v02.api.letsencrypt.org/directory
ACME_CA_URI=https://acme-v02.api.letsencrypt.org/directory

View File

@@ -1,12 +1,47 @@
# nginx-proxy
The single web ingress. Reverse-proxies HTTPS UIs and stream-proxies MQTT-TLS.
The single web ingress for cloud + edge. Reverse-proxies HTTPS UIs and stream-proxies MQTT-TLS to RabbitMQ. TLS certificates managed by a certbot sidecar (Let's Encrypt, HTTP-01 webroot challenge).
- **Image**: stock `nginx:1.27-alpine` (we don't use `nginxproxy/nginx-proxy` because we need the `stream {}` context for MQTT-TLS, which that image doesn't expose cleanly)
- **Sidecar**: `certbot/certbot:latest` — renews every 12h, shared `nginx-certs` + `nginx-acme-challenge` volumes
- **Networks**: `edge` (the only port-publisher) + `app` (talks to upstream services)
- **Host ports**: `tcp/80`, `tcp/443`, `tcp/8883`
- **Config**:
- `config/nginx.conf` — base
- `config/conf.d/*.conf` — HTTP vhosts (one per upstream UI)
- `config/stream.d/mqtt.conf` — MQTT-TLS stream block, SNI route to mqtt broker
- `config/certs/` — TLS certs (volume-mounted from cert manager)
- **TODO**: pick SSL strategy (acme-companion sidecar vs certbot vs internal PKI), write vhost templates per upstream
## Config layout
```
config/
├── nginx.conf # base config — must include `stream {}` directive
├── conf.d/ # HTTP vhosts (one per upstream UI)
│ ├── grafana.conf
│ ├── node-red.conf
│ ├── gitea.conf
│ └── ...
└── stream.d/
└── mqtt.conf # MQTT-TLS stream block, SNI route to rabbitmq:1883
```
Volumes:
- `nginx-certs` — Let's Encrypt cert chains (`/etc/letsencrypt`), read-only mounted into nginx, writable from certbot
- `nginx-acme-challenge` — webroot for HTTP-01 challenges (`/var/www/certbot`)
## Initial cert issuance
1. Start with HTTP-only nginx config (serving `/.well-known/acme-challenge/`).
2. Issue:
```bash
docker compose run --rm certbot certonly \
--webroot -w /var/www/certbot \
--email "$LETSENCRYPT_EMAIL" --agree-tos --no-eff-email \
-d gitea.example.com -d grafana.example.com -d nodered.example.com
```
3. Drop HTTPS vhost configs into `config/conf.d/` and reload nginx.
The sidecar then renews automatically.
## TODO
- Write base `config/nginx.conf` (`http` + `stream` contexts)
- Per-upstream vhost templates with OIDC `auth_request` to Keycloak
- Decide internal PKI vs Let's Encrypt for cloud-internal hostnames not reachable from the public internet
- Edge-side variant: bind to plant-LAN IP only, internal CA for plant.local hostnames

View File

@@ -1,22 +1,44 @@
# nginx-proxy — TLS reverse proxy (HTTPS + MQTT-TLS)
# nginx-proxy — TLS reverse proxy (HTTPS + MQTT-TLS stream proxy)
# Stock nginx + certbot sidecar for Let's Encrypt automation.
# Networks: edge (port publisher) + app (proxy targets)
# Publishes: 80, 443, 8883 on the host
services:
nginx-proxy:
nginx:
image: nginx:1.27-alpine
restart: unless-stopped
networks: [edge, app]
ports:
- "80:80"
- "443:443"
- "8883:8883" # MQTT-TLS via stream{} block
- "8883:8883" # MQTT-TLS via stream{} block, SNI route to rabbitmq
volumes:
- ./config/nginx.conf:/etc/nginx/nginx.conf:ro
- ./config/conf.d:/etc/nginx/conf.d:ro
- ./config/stream.d:/etc/nginx/stream.d:ro
- ./config/nginx.conf:/etc/nginx/nginx.conf:ro
- nginx-certs:/etc/nginx/certs:ro
# TODO: SSL strategy (acme-companion sidecar vs certbot vs internal PKI)
- nginx-certs:/etc/letsencrypt:ro
- nginx-acme-challenge:/var/www/certbot:ro
depends_on:
- certbot
certbot:
image: certbot/certbot:latest
restart: unless-stopped
volumes:
- nginx-certs:/etc/letsencrypt
- nginx-acme-challenge:/var/www/certbot
entrypoint: /bin/sh -c
command: >
"trap exit TERM;
while :; do
certbot renew --webroot -w /var/www/certbot --quiet;
sleep 12h & wait $${!};
done"
# Initial issuance is manual:
# docker compose run --rm certbot certonly \
# --webroot -w /var/www/certbot \
# --email "$LETSENCRYPT_EMAIL" --agree-tos --no-eff-email \
# -d <host1> -d <host2> ...
networks:
edge:
@@ -24,3 +46,4 @@ networks:
volumes:
nginx-certs:
nginx-acme-challenge:

View File

@@ -0,0 +1,3 @@
RABBITMQ_USER=admin
RABBITMQ_PASSWORD=
RABBITMQ_VHOST=/

17
stacks/rabbitmq/README.md Normal file
View File

@@ -0,0 +1,17 @@
# rabbitmq
General-purpose message broker. AMQP for app-to-app messaging; MQTT plugin for external MQTT-TLS clients (fronted by nginx-proxy stream block).
Used at both cloud and edge. The `mosquitto` stack is reserved for the FROST SensorThings stack only — do not confuse the two.
- **Network**: `app` — no published port
- **External MQTT** clients reach `nginx-proxy:8883` (cloud) which stream-proxies to `rabbitmq:1883` internally. Edge brokers are internal-only.
- **Management UI**: port 15672 → reverse-proxied through nginx-proxy
- **Plugins to enable** in `config/enabled_plugins`:
```
[rabbitmq_management, rabbitmq_mqtt, rabbitmq_web_mqtt].
```
- **TODO**:
- ACL policies
- VHost layout (cloud-bus, plant-buses, ml-bus)
- Shovel/federation to `mosquitto` if FROST needs MQTT bridge to general broker

View File

@@ -0,0 +1,26 @@
# rabbitmq — general-purpose message broker (AMQP + MQTT plugin)
# Used at both cloud and edge. Cloud-side: external clients reach it via
# nginx stream proxy on tcp/8883. Edge-side: internal only.
# Networks: app
services:
rabbitmq:
image: rabbitmq:3.13-management
restart: unless-stopped
networks: [app]
environment:
RABBITMQ_DEFAULT_USER: ${RABBITMQ_USER}
RABBITMQ_DEFAULT_PASS: ${RABBITMQ_PASSWORD}
RABBITMQ_DEFAULT_VHOST: ${RABBITMQ_VHOST:-/}
TZ: ${TZ:-Europe/Amsterdam}
volumes:
- rabbitmq-data:/var/lib/rabbitmq
- ./config/enabled_plugins:/etc/rabbitmq/enabled_plugins:ro
- ./config/rabbitmq.conf:/etc/rabbitmq/rabbitmq.conf:ro
# No 'ports:' — nginx-proxy stream-proxies external 8883 to internal 1883
networks:
app:
volumes:
rabbitmq-data:

View File

@@ -1,11 +1,9 @@
# sql
Central configuration database — the "single point of config" backing Keycloak, Gitea, and other stacks that need a relational store. **Cloud-only stack.**
Central configuration database — the "single point of config" backing Keycloak, Gitea, MLflow, and any stack that needs a relational store. **Cloud-only.**
- **Engine**: postgres 16-alpine
- **Network**: `data` only (no internet egress)
- **Engine**: stub uses **postgres:16-alpine** pending decision (postgres vs mariadb vs mysql)
- **Volume**: `sql-data`
- **TODO**:
- confirm engine choice (likely postgres for keycloak + gitea compatibility)
- per-app database/role provisioning (init scripts in `config/init.d/`)
- backup strategy (pg_dump cron sidecar vs streaming replica)
- **Volume**: `sql-data` (PGDATA)
- **Init scripts**: `config/init.d/*.sql` runs on first start — provisions per-app databases/roles (gitea, keycloak, mlflow, …)
- **TODO**: backup strategy (pg_dump cron sidecar vs streaming replica)

View File

@@ -1,6 +1,5 @@
# sql — single point of config DB (cloud only)
# sql — central config DB (postgres, cloud only)
# Networks: data (no internet egress)
# TBD: postgres / mariadb / mysql — stub uses postgres pending decision
services:
sql:
@@ -14,6 +13,7 @@ services:
TZ: ${TZ:-Europe/Amsterdam}
volumes:
- sql-data:/var/lib/postgresql/data
- ./config/init.d:/docker-entrypoint-initdb.d:ro
networks:
data: