Files
infra/cloud/README.md
R de Ren 33a794e35d feat(sso): wire Keycloak SSO end-to-end across all apps
New stack:
- stacks/oauth2-proxy/ — per-app sidecars (mlflow, portainer, rabbitmq)
  that gate vhosts via nginx auth_request against Keycloak's wbd realm.

Native OIDC wired into:
- grafana       (generic_oauth, role-attribute-path → Admin/Editor/Viewer)
- jupyterhub    (oauthenticator.GenericOAuthenticator)
- node-red      (passport-openidconnect; in-memory state store + users()
                 resolver because adminAuth doesn't expose req.session)
- jenkins       (oic-auth plugin via JCasC; matrix-auth for authz; setup
                 wizard suppressed; custom image with plugins.txt)

Infra fixes uncovered while bringing the above online:
- nginx-proxy: bump proxy_buffer_size to 16k so oauth2-proxy callbacks
  don't 502 on the JWT-bearing Set-Cookie header.
- nginx-proxy: add `resolver 127.0.0.11 valid=30s` so service names
  re-resolve after sidecar recreates (was cross-wiring oauth2-proxy
  upstreams after restart).
- jupyterhub: pass --allow-root to the singleuser spawner (hub runs as
  root inside its container; jupyter-server refused root without flag).
- jupyterhub Dockerfile: install jupyterlab + notebook so
  SimpleLocalProcessSpawner has something to launch.
- node-red Dockerfile: install passport-openidconnect into the image
  so settings.js can require() it.
- portainer: pre-seed local admin via --admin-password=<bcrypt-hash>
  so the 5-minute "no admin → lockout" timer can never trigger.
- deploy.sh: restore executable bit (was 644 in repo).

Admin/viewer policy:
- Created realm role `app-admin` in keycloak wbd realm.
- Grafana maps app-admin → Admin (default Viewer).
- Jenkins matrix-auth grants r.de.ren Overall/Administer, authenticated
  users get Overall/Read + Job/Read + View/Read.
- Node-RED: NODERED_ADMIN_USERS env list → permissions "*", others
  ["read"]. (TODO: switch to app-admin realm role.)
- JupyterHub: JUPYTERHUB_ADMIN_USERS env list. (Same TODO.)
- Gitea: r.de.ren pre-created as local admin; OIDC auto-links via email.

Docs:
- README, cloud/README, stacks/oauth2-proxy/README, and per-stack
  READMEs updated to reflect the new state and remove resolved TODOs.
- cloud/.env.example gains all the new OIDC client + cookie-secret keys.
- cloud/README documents the full kcadm realm bootstrap, including the
  hardcoded-audience mapper and post-logout redirect URIs that are
  non-obvious gotchas.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 18:34:37 +00:00

6.4 KiB

cloud

The single central hub. One deployment, internet-facing.

What runs here

nginx-proxy, wireguard-server, keycloak, oauth2-proxy, portainer, influxdb, grafana, node-red, rabbitmq, postfix, gitea, jenkins, mlflow, jupyterhub, frost, sql.

See ../docs/architecture.md for the full network topology and ingress table.

Every human-accessible UI is gated by Keycloak SSO (wbd realm). Apps with native OIDC (gitea, grafana, node-red, jenkins, jupyterhub) speak OIDC directly; apps without (mlflow, portainer, rabbitmq) are gated by an oauth2-proxy sidecar via nginx auth_request. See the Keycloak bootstrap section below.

Run

cp .env.example .env     # fill in real secrets first
./deploy.sh              # one-shot bring-up: containers + cert + smoke test

deploy.sh is idempotent — rerun any time. It will:

  1. Preflight — check .env has all required vars
  2. Validate docker compose config
  3. Bring up containers, wait for sql healthcheck, wait for nginx :80
  4. Inspect cert — figure out whether the current cert is self-signed, staging, or prod
  5. Issue / renew the SAN cert via certbot only when needed (initial issuance, or when ACME_CA_URI no longer matches the current issuer); reload nginx
  6. Status — show docker compose ps
  7. Smoke test every *.wbd-rd.nl vhost over loopback

The script reissues the cert only when the CA in .env changes (e.g. staging → prod) or when only the bootstrap dummy is present — it does not waste Let's Encrypt rate limits on repeated runs.

Staging → prod flip

  1. Verify everything works with the staging cert (browser will warn — that's normal)
  2. Edit .env: change ACME_CA_URI to https://acme-v02.api.letsencrypt.org/directory
  3. ./deploy.sh — script detects the CA change and force-renews against prod

Ingress (host port bindings)

Port Container
tcp/80, 443 nginx-proxy
tcp/8883 nginx-proxy (MQTT-TLS via stream block)
udp/51820 wireguard-server

Everything else stays on the internal app / data / mgmt networks.

Keycloak realm bootstrap (one-time)

After deploy.sh succeeds, Keycloak is up at https://auth.wbd-rd.nl/ with only the master realm. You need to create the wbd realm + per-app OIDC clients before SSO works. Driven entirely by kcadm.sh inside the keycloak container, so it's reproducible:

cd cloud
set -a && . ./.env && set +a

KC="docker compose exec -T keycloak /opt/keycloak/bin/kcadm.sh"

# 1. Authenticate against master realm
$KC config credentials --server http://localhost:8080 --realm master \
  --user "$KEYCLOAK_ADMIN" --password "$KEYCLOAK_ADMIN_PASSWORD"

# 2. Create the realm
$KC create realms -r master \
  -s realm=wbd -s enabled=true -s displayName="WBD R&D" \
  -s registrationAllowed=false -s resetPasswordAllowed=true -s rememberMe=true

# 3. Create an OIDC client per app. Pattern:
#       clientId    = stable lowercase name (matches *_OAUTH_CLIENT_ID in .env)
#       redirectUri = the app's documented OIDC callback URL
#    Hardcoded audience mapper is critical for oauth2-proxy clients — without
#    it the access token's aud will be [realm-management, account] and
#    oauth2-proxy will 500 on the callback.

create_client() {
  local CID=$1 REDIRECT=$2 SECRET=$3
  local ID=$($KC create clients -r wbd \
    -s clientId=$CID -s enabled=true -s protocol=openid-connect \
    -s publicClient=false -s standardFlowEnabled=true \
    -s "redirectUris=[\"$REDIRECT\"]" \
    -s "attributes.\"post.logout.redirect.uris\"=\"https://${REDIRECT#https://*/}*\"" \
    ${SECRET:+-s secret="$SECRET"} -i)
  $KC create clients/$ID/protocol-mappers/models -r wbd \
    -s name=audience-self -s protocol=openid-connect \
    -s protocolMapper=oidc-audience-mapper \
    -s "config.\"included.client.audience\"=$CID" \
    -s "config.\"access.token.claim\"=true" -s "config.\"id.token.claim\"=false"
  echo "$CID -> $ID"
}

create_client gitea         "https://git.wbd-rd.nl/user/oauth2/keycloak/callback" "$GITEA_OAUTH_CLIENT_SECRET"
create_client grafana       "https://dash.wbd-rd.nl/login/generic_oauth"          "$GRAFANA_OAUTH_CLIENT_SECRET"
create_client node-red      "https://flow.wbd-rd.nl/auth/strategy/callback/"      "$NODERED_OAUTH_CLIENT_SECRET"
create_client jenkins       "https://ci.wbd-rd.nl/securityRealm/finishLogin"      "$JENKINS_OAUTH_CLIENT_SECRET"
create_client jupyterhub    "https://hub.wbd-rd.nl/hub/oauth_callback"            "$JUPYTERHUB_OAUTH_CLIENT_SECRET"
create_client mlflow        "https://ml.wbd-rd.nl/oauth2/callback"                "$MLFLOW_OAUTH_CLIENT_SECRET"
create_client portainer-ce  "https://ops.wbd-rd.nl/oauth2/callback"               "$PORTAINER_OAUTH_CLIENT_SECRET"
create_client rabbitmq      "https://mq.wbd-rd.nl/oauth2/callback"                "$RABBITMQ_OAUTH_CLIENT_SECRET"

# 4. Create the realm role used for cross-app admin promotion
$KC create roles -r wbd -s name=app-admin \
  -s "description=Grants admin perms across all wbd-realm apps that recognise this role"

# 5. Create the first operator user and grant realm-admin + app-admin
$KC create users -r wbd \
  -s username=r.de.ren -s email=r.de.ren@brabantsedelta.nl -s emailVerified=true \
  -s firstName=R -s lastName='de Ren' -s enabled=true \
  -s 'requiredActions=["UPDATE_PASSWORD"]'
$KC set-password -r wbd --username r.de.ren --new-password '<initial-temp-password>' --temporary
$KC add-roles -r wbd --uusername r.de.ren --cclientid realm-management --rolename realm-admin
$KC add-roles -r wbd --uusername r.de.ren --rolename app-admin

# 6. Final per-app wiring
#    - Gitea: run `gitea admin user create --admin --username r.de.ren ...` then
#             `gitea admin auth add-oauth --name keycloak ...` (see stacks/gitea/README.md)
#    - Everything else picks up its OIDC config from .env on next start.

docker compose restart grafana node-red jenkins jupyterhub

After that, every vhost in the smoke-test table redirects unauthenticated users to Keycloak. New teammates: add a user in https://auth.wbd-rd.nl/admin/wbd/console/ → Users → Add user; default permissions are viewer/read-only across all apps until you also assign them the app-admin realm role.

Adding a new stack

  1. Create stacks/<name>/ with compose.yml, .env.example, README.md.
  2. Uncomment (or add) the include: entry in compose.yml.
  3. Add the stack's env vars to .env.example.
  4. docker compose pull && docker compose up -d.