infra

RnD/infra

Author	SHA1	Message	Date
R de Ren	33a794e35d	feat(sso): wire Keycloak SSO end-to-end across all apps New stack: - stacks/oauth2-proxy/ — per-app sidecars (mlflow, portainer, rabbitmq) that gate vhosts via nginx auth_request against Keycloak's wbd realm. Native OIDC wired into: - grafana (generic_oauth, role-attribute-path → Admin/Editor/Viewer) - jupyterhub (oauthenticator.GenericOAuthenticator) - node-red (passport-openidconnect; in-memory state store + users() resolver because adminAuth doesn't expose req.session) - jenkins (oic-auth plugin via JCasC; matrix-auth for authz; setup wizard suppressed; custom image with plugins.txt) Infra fixes uncovered while bringing the above online: - nginx-proxy: bump proxy_buffer_size to 16k so oauth2-proxy callbacks don't 502 on the JWT-bearing Set-Cookie header. - nginx-proxy: add `resolver 127.0.0.11 valid=30s` so service names re-resolve after sidecar recreates (was cross-wiring oauth2-proxy upstreams after restart). - jupyterhub: pass --allow-root to the singleuser spawner (hub runs as root inside its container; jupyter-server refused root without flag). - jupyterhub Dockerfile: install jupyterlab + notebook so SimpleLocalProcessSpawner has something to launch. - node-red Dockerfile: install passport-openidconnect into the image so settings.js can require() it. - portainer: pre-seed local admin via --admin-password=<bcrypt-hash> so the 5-minute "no admin → lockout" timer can never trigger. - deploy.sh: restore executable bit (was 644 in repo). Admin/viewer policy: - Created realm role `app-admin` in keycloak wbd realm. - Grafana maps app-admin → Admin (default Viewer). - Jenkins matrix-auth grants r.de.ren Overall/Administer, authenticated users get Overall/Read + Job/Read + View/Read. - Node-RED: NODERED_ADMIN_USERS env list → permissions "*", others ["read"]. (TODO: switch to app-admin realm role.) - JupyterHub: JUPYTERHUB_ADMIN_USERS env list. (Same TODO.) - Gitea: r.de.ren pre-created as local admin; OIDC auto-links via email. Docs: - README, cloud/README, stacks/oauth2-proxy/README, and per-stack READMEs updated to reflect the new state and remove resolved TODOs. - cloud/.env.example gains all the new OIDC client + cookie-secret keys. - cloud/README documents the full kcadm realm bootstrap, including the hardcoded-audience mapper and post-logout redirect URIs that are non-obvious gotchas. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-21 18:34:37 +00:00
znetsixe	f69453df99	refactor(dns): rename frost.wbd-rd.nl → sta.wbd-rd.nl; drop redundant portainer.wbd-rd.nl Match the short-functional naming convention used by the other vhosts (git, auth, dash, flow, ml, hub, ops, mq, ci, mqtt). FROST implements OGC SensorThings API, so `sta` is the natural fit. portainer.wbd-rd.nl is dropped from deploy.sh HOSTS — there is no nginx vhost for it; portainer is already served via ops.wbd-rd.nl. DNS prereq for first deploy is now: create one new A record for sta.wbd-rd.nl → cloud public IP. All other short subdomains already point correctly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-21 16:46:32 +02:00
znetsixe	4117ec6063	feat(cloud): single-shot deploy.sh + FROST stack + healthchecks Stage 5 — make the cloud composition spin up in one command and add the SensorThings (FROST) stack as a fully segregated tenant. cloud/deploy.sh — idempotent, 7-step bring-up: preflight → validate → up + wait → cert state → issue/renew → service status → endpoint smoke test. Reissues LE cert only when current issuer no longer matches ACME_CA_URI. Move-aside-then- restore-on-failure so the bootstrap cert survives a failed certbot. stacks/frost — new stack, segregated from shared sql/rabbitmq: - dedicated postgis container (frost-db) - dedicated internal mosquitto bus (frost-mosquitto) - frost-http + frost-mqtt on a private frost-internal network, joined to cloud-app only for nginx ingress at frost.wbd-rd.nl - shared mosquitto stack deleted; rabbitmq remains the only public MQTT broker (mqtt.wbd-rd.nl:8883 via stream proxy) stacks/sql — pg_isready healthcheck so keycloak/gitea/mlflow can gate on service_healthy via cloud-level depends_on overrides. stacks/nginx-proxy: - nginx-init service generates a self-signed bootstrap cert on fresh deploy so nginx starts before certbot has issued a real one - frost.wbd-rd.nl vhost (/FROST-Server → frost-http:8080, /mqtt → frost-mqtt:9876 WebSocket) stacks/mlflow — custom Dockerfile (upstream + psycopg2-binary) so the official image can speak to the shared sql backend. stacks/jupyterhub — DummyAuthenticator stub gated by JUPYTERHUB_ADMIN_PASSWORD; TODO comments point at OIDC + DockerSpawner. stacks/rabbitmq — config/{enabled_plugins,rabbitmq.conf} stubs (management + mqtt plugins, MQTT auth required). stacks/portainer — ports unpublished; nginx now the only ingress. stacks/node-red — pin to 4.1 (the floating "4" tag does not exist). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-21 16:37:58 +02:00
znetsixe	035ac757ae	feat(gitea): Stage 4 — hardened compose with OIDC-ready config + Keycloak CLI auth source stacks/gitea/compose.yml — full production-grade env: - Server posture: PROTOCOL=http (nginx terminates TLS), DOMAIN=git.wbd-rd.nl, DISABLE_SSH=true, INSTALL_LOCK=true (skip web wizard). - Postgres backend (DB+role auto-provisioned by sql/config/init.d/). - Local registration disabled; users provisioned via Keycloak OIDC with ENABLE_AUTO_REGISTRATION=true so first OIDC login auto-creates the matching local account (USERNAME=nickname, ACCOUNT_LINKING=auto). - Mail stub via postfix on app network (ENABLED=false until postfix is up). - Repos default to private. - GITEA_OAUTH_* env vars are pass-through values consumed only by the post-deploy CLI step; gitea itself doesn't read them. stacks/gitea/.env.example — DB connection, OAuth client ID/secret/discovery URL, mail-from. Empty placeholders for secrets. stacks/gitea/README.md — full Stage 5 deploy script: 1. Fill GITEA_DB_PASSWORD + GITEA_OAUTH_CLIENT_SECRET in cloud/.env 2. docker compose up -d gitea 3. gitea admin user create --admin --random-password 4. gitea admin auth add-oauth --provider openidConnect --auto-discover-url https://auth.wbd-rd.nl/realms/wbd/.well-known/openid-configuration 5. Browse https://git.wbd-rd.nl/ → "Sign in with keycloak" cloud/compose.yml — uncomment gitea include. cloud/.env.example — add GITEA_DOMAIN, GITEA_OAUTH_*, GITEA_MAIL_FROM. .gitignore line 2 (`.env`) already catches .env files at any depth (verified with `git check-ignore`). Secrets won't be committed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-21 14:10:54 +02:00
znetsixe	af354a4b9e	feat(cloud): short function-based subdomains + harden keycloak with postgres Subdomain rename (Versio side keeps original tool-named hostnames) - nginx vhosts updated: grafana -> dash.wbd-rd.nl gitea -> git.wbd-rd.nl keycloak -> auth.wbd-rd.nl node-red -> flow.wbd-rd.nl mlflow -> ml.wbd-rd.nl jupyter -> hub.wbd-rd.nl portainer -> ops.wbd-rd.nl rabbitmq -> mq.wbd-rd.nl jenkins -> ci.wbd-rd.nl mqtt -> mqtt.wbd-rd.nl (no Versio conflict assumed) - nginx-proxy README: bootstrap cert -d list + DNS A-record prereqs updated - cloud/.env.example: GITEA_ROOT_URL, GRAFANA_ROOT_URL, KEYCLOAK_HOSTNAME Function-based names are tool-agnostic (a Grafana -> Kibana swap leaves dash.wbd-rd.nl meaningful) and avoid one-off "*2" suffixes. Keycloak hardening - Switch backend from bundled file storage to postgres (keycloak DB already provisioned by sql/config/init.d/01-databases.sh). - KC_HOSTNAME=auth.wbd-rd.nl, KC_PROXY_HEADERS=xforwarded for nginx reverse-proxy posture; KC_HTTP_ENABLED=true since nginx terminates TLS. - Added KC_HOSTNAME_STRICT, KC_HEALTH_ENABLED, KC_METRICS_ENABLED. - Service joins app + mgmt + data networks (data needed for postgres). - Mounted config/realms/ for realm-as-code (kc.sh import) — TODO to populate once realm + clients are designed. - README documents the recommended realm structure (wbd realm, one client per app with redirect URIs) and the oauth2-proxy approach for apps without native OIDC (mlflow, portainer-CE). cloud - Uncomment keycloak include in cloud/compose.yml. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-21 13:54:57 +02:00
znetsixe	5d95f8bfcc	feat(cloud): harden nginx-proxy + sql foundation; HTTP-01 interim cert plan Wire up the three foundation stacks (nginx-proxy, sql, portainer) in cloud/compose.yml and add real configs for the first two. nginx-proxy - Base nginx.conf with http + stream contexts, modern TLS profile, client_max_body_size baseline for gitea LFS / mlflow artifacts. - Vhosts under conf.d/: grafana, gitea, keycloak, nodered, mlflow, jupyter, portainer (HTTPS upstream), rabbitmq, jenkins. WebSocket upgrade headers where needed (grafana live, node-red editor, jupyterhub kernels, jenkins agents). - conf.d/00-default.conf serves /.well-known/acme-challenge/ on :80 and 301-redirects everything else. - stream.d/mqtt.conf terminates MQTT-TLS at 8883, proxies to rabbitmq:1883 internally. - All vhosts reference /etc/letsencrypt/live/infra/* — a stable path via certbot --cert-name infra, so the wildcard migration changes nothing in the vhost files. - README documents: HTTP-01 SAN interim during Versio period → DNS-01 wildcard via certbot-dns-transip after migration; bootstrap procedure (self-signed fallback → real cert issuance → reload). sql - config/init.d/01-databases.sh provisions gitea/keycloak/mlflow databases + roles on first start. Idempotent only via fresh data volume — change the script after first run requires manual psql or a volume wipe. - compose env extended with GITEA_DB_PASSWORD, KEYCLOAK_DB_PASSWORD, MLFLOW_DB_PASSWORD. cloud - include: now wires nginx-proxy + sql + portainer. Other stacks stay commented for future rounds. - .env.example adds KEYCLOAK_DB_PASSWORD and sensible defaults (LETSENCRYPT_EMAIL, GRAFANA_ROOT_URL, KEYCLOAK_HOSTNAME, GITEA_ROOT_URL, POSTFIX_FROM_DOMAIN all pointing at wbd-rd.nl). - Operator note inline: bring portainer's standalone instance down before deploying via cloud compose; comment its ports: block. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-21 13:43:35 +02:00
znetsixe	2f5e3b4183	feat: SQL=postgres, nginx+certbot, MQTT split, ML stacks, gitea HTTPS-only, gemaal1 site Round-2 changes locking in scaffold-phase decisions and adding ML/notebook stacks. Locked decisions - sql: postgres 16-alpine (was TBD); init.d/ mount for per-app DB provisioning - nginx-proxy: stock nginx + certbot sidecar (was nginx:alpine TODO). Chose stock over nginxproxy/nginx-proxy because stream{} is required for MQTT-TLS reverse-proxy on tcp/8883 to rabbitmq:1883. - gitea: HTTPS-only (DISABLE_SSH=true). No SSH port published. MQTT split - Remove stacks/mqtt placeholder. - Add stacks/rabbitmq — general-purpose broker (AMQP + MQTT plugin), used at both cloud and edge. External MQTT clients reach cloud broker via nginx stream-proxy on 8883. - Add stacks/mosquitto — reserved for the FROST (SensorThings) stack only. Cloud-only. Internal to its own stack; no external ingress. ML / notebooks (cloud-only) - stacks/mlflow — experiment tracking + model registry. Postgres backend on sql stack; local volume for artifacts (S3/MinIO is a TODO). - stacks/jupyterhub — multi-user notebook server. DockerSpawner via mounted docker.sock; users spawn into cloud-app network so they can reach mlflow, influxdb (via grafana), rabbitmq. Sites - sites/gemaal1 — first edge deployment scaffold. Site-local override template for binding nginx to PLANT_LAN_IP. Docs - README + docs/architecture.md updated: stacks table now lists 15 stacks, ingress + attachment tables reflect mlflow/jupyterhub, TLS strategy section locked, MQTT-split section added, Gitea HTTPS-only noted. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-21 13:22:46 +02:00
znetsixe	8ab9061983	scaffold: hub-and-spoke layout, 4-network topology, 13 stack stubs Initial structure for R&D infrastructure: - stacks/ — 13 reusable, runnable stack stubs (kebab-case) cloud-and-edge: node-red, influxdb, grafana, keycloak, portainer, nginx-proxy, mqtt, postfix cloud-only: wireguard-server, gitea, jenkins, sql (postgres stub) edge-only: wireguard-client - cloud/ — single central hub composition with 4 networks (edge, app, data internal, mgmt) and include: stubs - sites/ — per-plant edge folders (template README only for now) - docs/architecture.md — hub-and-spoke + ingress + segmentation rationale Network model: only nginx-proxy (80/443/8883) and wireguard-server (51820/udp) publish ports on the cloud host. Edge nginx publishes 80/443 on plant-LAN interface only. MQTT cloud-side via nginx stream proxy; MQTT edge-side internal-only; Postfix outbound-only. OT layer (OPCUA, PLCs) is out of scope for this repo. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-21 12:37:59 +02:00

8 Commits