feat(sso): wire Keycloak SSO end-to-end across all apps

New stack:
- stacks/oauth2-proxy/ — per-app sidecars (mlflow, portainer, rabbitmq)
  that gate vhosts via nginx auth_request against Keycloak's wbd realm.

Native OIDC wired into:
- grafana       (generic_oauth, role-attribute-path → Admin/Editor/Viewer)
- jupyterhub    (oauthenticator.GenericOAuthenticator)
- node-red      (passport-openidconnect; in-memory state store + users()
                 resolver because adminAuth doesn't expose req.session)
- jenkins       (oic-auth plugin via JCasC; matrix-auth for authz; setup
                 wizard suppressed; custom image with plugins.txt)

Infra fixes uncovered while bringing the above online:
- nginx-proxy: bump proxy_buffer_size to 16k so oauth2-proxy callbacks
  don't 502 on the JWT-bearing Set-Cookie header.
- nginx-proxy: add `resolver 127.0.0.11 valid=30s` so service names
  re-resolve after sidecar recreates (was cross-wiring oauth2-proxy
  upstreams after restart).
- jupyterhub: pass --allow-root to the singleuser spawner (hub runs as
  root inside its container; jupyter-server refused root without flag).
- jupyterhub Dockerfile: install jupyterlab + notebook so
  SimpleLocalProcessSpawner has something to launch.
- node-red Dockerfile: install passport-openidconnect into the image
  so settings.js can require() it.
- portainer: pre-seed local admin via --admin-password=<bcrypt-hash>
  so the 5-minute "no admin → lockout" timer can never trigger.
- deploy.sh: restore executable bit (was 644 in repo).

Admin/viewer policy:
- Created realm role `app-admin` in keycloak wbd realm.
- Grafana maps app-admin → Admin (default Viewer).
- Jenkins matrix-auth grants r.de.ren Overall/Administer, authenticated
  users get Overall/Read + Job/Read + View/Read.
- Node-RED: NODERED_ADMIN_USERS env list → permissions "*", others
  ["read"]. (TODO: switch to app-admin realm role.)
- JupyterHub: JUPYTERHUB_ADMIN_USERS env list. (Same TODO.)
- Gitea: r.de.ren pre-created as local admin; OIDC auto-links via email.

Docs:
- README, cloud/README, stacks/oauth2-proxy/README, and per-stack
  READMEs updated to reflect the new state and remove resolved TODOs.
- cloud/.env.example gains all the new OIDC client + cookie-secret keys.
- cloud/README documents the full kcadm realm bootstrap, including the
  hardcoded-audience mapper and post-logout redirect URIs that are
  non-obvious gotchas.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-21 18:34:37 +00:00
parent c91b475cd1
commit 33a794e35d
31 changed files with 833 additions and 98 deletions

View File

@@ -4,13 +4,25 @@ Docker container management UI — the "operator console" for cloud and edge.
## Access
Portainer ingresses through nginx-proxy: `https://ops.wbd-rd.nl/`. No host port is published by default.
Portainer ingresses through nginx-proxy: `https://ops.wbd-rd.nl/`. No host port is published by default. For emergency ops (nginx down, etc.), uncomment the `ports:` block in `compose.yml` and `docker compose up -d portainer` to expose `:9443` and `:8000` directly.
For emergency ops (nginx down, etc.), uncomment the `ports:` block in `compose.yml` and `docker compose up -d portainer` to expose `:9443` and `:8000` directly.
## Auth
## First-run admin
Two layers:
On first visit, Portainer prompts for an admin username and password. Use a long random password; this account is break-glass — your daily login should come via Keycloak OIDC once that gate is wired (see TODO).
1. **Keycloak SSO gate** — the nginx vhost calls `auth_request` against `oauth2-proxy-portainer` (Keycloak `wbd` realm, client `portainer-ce`). Anyone not in the realm is bounced to Keycloak login.
2. **Portainer local admin** — once past the SSO gate, Portainer asks for its own credentials. Portainer-CE has no native OIDC, so there's no way to skip this second step on CE. The admin user is **pre-seeded** at boot via `--admin-password=<bcrypt-hash>` (see compose), with the hash stored in `.env` as `PORTAINER_ADMIN_PASSWORD_HASH`.
> Pre-seeding the admin bypasses Portainer's "5-minute setup window or lockout" behavior on fresh installs.
### Generating the bcrypt hash
```bash
docker run --rm python:3.13-alpine sh -c \
"pip install -q bcrypt && python -c \"import bcrypt; print(bcrypt.hashpw(b'<your-password>', bcrypt.gensalt()).decode())\""
```
Double every `$` in the resulting hash before pasting into `.env` (`$2b$``$$2b$$`) — Compose interpolates single `$`.
## Edge-agent topology
@@ -19,7 +31,8 @@ Port `8000` accepts reverse tunnels from edge sites running the `portainer/agent
## Networks
- **mgmt** — Docker management plane
- **Docker socket**: read-only mount; *effectively root-equivalent* on the host. Front with Keycloak SSO as soon as auth is wired.
- **app** — nginx-proxy reaches portainer:9443 from here
- **Docker socket**: read-only mount; *effectively root-equivalent* on the host. The Keycloak SSO gate is what limits who can talk to Portainer.
## Volumes
@@ -27,6 +40,6 @@ Port `8000` accepts reverse tunnels from edge sites running the `portainer/agent
## TODO
- Keycloak OIDC auth (Portainer CE needs a frontend gate; Business Edition has native OIDC if budget allows)
- Edge-agent provisioning workflow per site (agent secret, registration call)
- Disable self-signed `:9443` access after nginx-proxy goes live (operational hygiene)
- Map the `app-admin` Keycloak realm role to a Portainer team via the OAuth2 team-sync API (CE supports this) so promotion doesn't require manual Portainer-side admin clicks
- Drop the direct `:9443` host port permanently (currently still commented but available)

View File

@@ -2,6 +2,11 @@
# Networks: mgmt (docker socket plane) + app (nginx-proxy reaches HTTPS upstream)
# Ingress: nginx-proxy → portainer:9443 (self-signed upstream cert) → ops.wbd-rd.nl
#
# Auth: gated by oauth2-proxy (Keycloak wbd realm) AT the nginx layer; Portainer
# itself uses its local admin account (Portainer-CE has no native OIDC).
# The admin user is pre-seeded via --admin-password so the 5-minute setup-window
# lockout that Portainer applies to fresh installs can never trigger.
#
# Direct :9443 host access is intentionally NOT published anymore — re-enable
# only for emergency ops by uncommenting the `ports:` block below.
@@ -10,6 +15,8 @@ services:
image: portainer/portainer-ce:2.21.4
restart: unless-stopped
networks: [mgmt, app]
command:
- --admin-password=${PORTAINER_ADMIN_PASSWORD_HASH}
# ports:
# - "9443:9443" # HTTPS UI direct access (emergency ops only)
# - "8000:8000" # Edge-agent reverse tunnel (open when wiring edges)