Files
infra/stacks/jupyterhub/README.md

49 lines
2.0 KiB
Markdown
Raw Normal View History

feat: SQL=postgres, nginx+certbot, MQTT split, ML stacks, gitea HTTPS-only, gemaal1 site Round-2 changes locking in scaffold-phase decisions and adding ML/notebook stacks. Locked decisions - sql: postgres 16-alpine (was TBD); init.d/ mount for per-app DB provisioning - nginx-proxy: stock nginx + certbot sidecar (was nginx:alpine TODO). Chose stock over nginxproxy/nginx-proxy because stream{} is required for MQTT-TLS reverse-proxy on tcp/8883 to rabbitmq:1883. - gitea: HTTPS-only (DISABLE_SSH=true). No SSH port published. MQTT split - Remove stacks/mqtt placeholder. - Add stacks/rabbitmq — general-purpose broker (AMQP + MQTT plugin), used at both cloud and edge. External MQTT clients reach cloud broker via nginx stream-proxy on 8883. - Add stacks/mosquitto — reserved for the FROST (SensorThings) stack only. Cloud-only. Internal to its own stack; no external ingress. ML / notebooks (cloud-only) - stacks/mlflow — experiment tracking + model registry. Postgres backend on sql stack; local volume for artifacts (S3/MinIO is a TODO). - stacks/jupyterhub — multi-user notebook server. DockerSpawner via mounted docker.sock; users spawn into cloud-app network so they can reach mlflow, influxdb (via grafana), rabbitmq. Sites - sites/gemaal1 — first edge deployment scaffold. Site-local override template for binding nginx to PLANT_LAN_IP. Docs - README + docs/architecture.md updated: stacks table now lists 15 stacks, ingress + attachment tables reflect mlflow/jupyterhub, TLS strategy section locked, MQTT-split section added, Gitea HTTPS-only noted. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 13:22:46 +02:00
# jupyterhub
feat(sso): wire Keycloak SSO end-to-end across all apps New stack: - stacks/oauth2-proxy/ — per-app sidecars (mlflow, portainer, rabbitmq) that gate vhosts via nginx auth_request against Keycloak's wbd realm. Native OIDC wired into: - grafana (generic_oauth, role-attribute-path → Admin/Editor/Viewer) - jupyterhub (oauthenticator.GenericOAuthenticator) - node-red (passport-openidconnect; in-memory state store + users() resolver because adminAuth doesn't expose req.session) - jenkins (oic-auth plugin via JCasC; matrix-auth for authz; setup wizard suppressed; custom image with plugins.txt) Infra fixes uncovered while bringing the above online: - nginx-proxy: bump proxy_buffer_size to 16k so oauth2-proxy callbacks don't 502 on the JWT-bearing Set-Cookie header. - nginx-proxy: add `resolver 127.0.0.11 valid=30s` so service names re-resolve after sidecar recreates (was cross-wiring oauth2-proxy upstreams after restart). - jupyterhub: pass --allow-root to the singleuser spawner (hub runs as root inside its container; jupyter-server refused root without flag). - jupyterhub Dockerfile: install jupyterlab + notebook so SimpleLocalProcessSpawner has something to launch. - node-red Dockerfile: install passport-openidconnect into the image so settings.js can require() it. - portainer: pre-seed local admin via --admin-password=<bcrypt-hash> so the 5-minute "no admin → lockout" timer can never trigger. - deploy.sh: restore executable bit (was 644 in repo). Admin/viewer policy: - Created realm role `app-admin` in keycloak wbd realm. - Grafana maps app-admin → Admin (default Viewer). - Jenkins matrix-auth grants r.de.ren Overall/Administer, authenticated users get Overall/Read + Job/Read + View/Read. - Node-RED: NODERED_ADMIN_USERS env list → permissions "*", others ["read"]. (TODO: switch to app-admin realm role.) - JupyterHub: JUPYTERHUB_ADMIN_USERS env list. (Same TODO.) - Gitea: r.de.ren pre-created as local admin; OIDC auto-links via email. Docs: - README, cloud/README, stacks/oauth2-proxy/README, and per-stack READMEs updated to reflect the new state and remove resolved TODOs. - cloud/.env.example gains all the new OIDC client + cookie-secret keys. - cloud/README documents the full kcadm realm bootstrap, including the hardcoded-audience mapper and post-logout redirect URIs that are non-obvious gotchas. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 18:34:37 +00:00
Multi-user JupyterHub. **Cloud-only.**
feat: SQL=postgres, nginx+certbot, MQTT split, ML stacks, gitea HTTPS-only, gemaal1 site Round-2 changes locking in scaffold-phase decisions and adding ML/notebook stacks. Locked decisions - sql: postgres 16-alpine (was TBD); init.d/ mount for per-app DB provisioning - nginx-proxy: stock nginx + certbot sidecar (was nginx:alpine TODO). Chose stock over nginxproxy/nginx-proxy because stream{} is required for MQTT-TLS reverse-proxy on tcp/8883 to rabbitmq:1883. - gitea: HTTPS-only (DISABLE_SSH=true). No SSH port published. MQTT split - Remove stacks/mqtt placeholder. - Add stacks/rabbitmq — general-purpose broker (AMQP + MQTT plugin), used at both cloud and edge. External MQTT clients reach cloud broker via nginx stream-proxy on 8883. - Add stacks/mosquitto — reserved for the FROST (SensorThings) stack only. Cloud-only. Internal to its own stack; no external ingress. ML / notebooks (cloud-only) - stacks/mlflow — experiment tracking + model registry. Postgres backend on sql stack; local volume for artifacts (S3/MinIO is a TODO). - stacks/jupyterhub — multi-user notebook server. DockerSpawner via mounted docker.sock; users spawn into cloud-app network so they can reach mlflow, influxdb (via grafana), rabbitmq. Sites - sites/gemaal1 — first edge deployment scaffold. Site-local override template for binding nginx to PLANT_LAN_IP. Docs - README + docs/architecture.md updated: stacks table now lists 15 stacks, ingress + attachment tables reflect mlflow/jupyterhub, TLS strategy section locked, MQTT-split section added, Gitea HTTPS-only noted. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 13:22:46 +02:00
feat(sso): wire Keycloak SSO end-to-end across all apps New stack: - stacks/oauth2-proxy/ — per-app sidecars (mlflow, portainer, rabbitmq) that gate vhosts via nginx auth_request against Keycloak's wbd realm. Native OIDC wired into: - grafana (generic_oauth, role-attribute-path → Admin/Editor/Viewer) - jupyterhub (oauthenticator.GenericOAuthenticator) - node-red (passport-openidconnect; in-memory state store + users() resolver because adminAuth doesn't expose req.session) - jenkins (oic-auth plugin via JCasC; matrix-auth for authz; setup wizard suppressed; custom image with plugins.txt) Infra fixes uncovered while bringing the above online: - nginx-proxy: bump proxy_buffer_size to 16k so oauth2-proxy callbacks don't 502 on the JWT-bearing Set-Cookie header. - nginx-proxy: add `resolver 127.0.0.11 valid=30s` so service names re-resolve after sidecar recreates (was cross-wiring oauth2-proxy upstreams after restart). - jupyterhub: pass --allow-root to the singleuser spawner (hub runs as root inside its container; jupyter-server refused root without flag). - jupyterhub Dockerfile: install jupyterlab + notebook so SimpleLocalProcessSpawner has something to launch. - node-red Dockerfile: install passport-openidconnect into the image so settings.js can require() it. - portainer: pre-seed local admin via --admin-password=<bcrypt-hash> so the 5-minute "no admin → lockout" timer can never trigger. - deploy.sh: restore executable bit (was 644 in repo). Admin/viewer policy: - Created realm role `app-admin` in keycloak wbd realm. - Grafana maps app-admin → Admin (default Viewer). - Jenkins matrix-auth grants r.de.ren Overall/Administer, authenticated users get Overall/Read + Job/Read + View/Read. - Node-RED: NODERED_ADMIN_USERS env list → permissions "*", others ["read"]. (TODO: switch to app-admin realm role.) - JupyterHub: JUPYTERHUB_ADMIN_USERS env list. (Same TODO.) - Gitea: r.de.ren pre-created as local admin; OIDC auto-links via email. Docs: - README, cloud/README, stacks/oauth2-proxy/README, and per-stack READMEs updated to reflect the new state and remove resolved TODOs. - cloud/.env.example gains all the new OIDC client + cookie-secret keys. - cloud/README documents the full kcadm realm bootstrap, including the hardcoded-audience mapper and post-logout redirect URIs that are non-obvious gotchas. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 18:34:37 +00:00
- **Hostname**: `hub.wbd-rd.nl`
- **Networks**: `app` (UI proxied) + `mgmt` (Docker socket — for DockerSpawner once we switch to it)
- **Config**: `config/jupyterhub_config.py`
- **Image**: built locally (`cloud-jupyterhub:5`) — upstream JupyterHub + `oauthenticator` + `jupyterlab` + `notebook`. See `Dockerfile`.
feat: SQL=postgres, nginx+certbot, MQTT split, ML stacks, gitea HTTPS-only, gemaal1 site Round-2 changes locking in scaffold-phase decisions and adding ML/notebook stacks. Locked decisions - sql: postgres 16-alpine (was TBD); init.d/ mount for per-app DB provisioning - nginx-proxy: stock nginx + certbot sidecar (was nginx:alpine TODO). Chose stock over nginxproxy/nginx-proxy because stream{} is required for MQTT-TLS reverse-proxy on tcp/8883 to rabbitmq:1883. - gitea: HTTPS-only (DISABLE_SSH=true). No SSH port published. MQTT split - Remove stacks/mqtt placeholder. - Add stacks/rabbitmq — general-purpose broker (AMQP + MQTT plugin), used at both cloud and edge. External MQTT clients reach cloud broker via nginx stream-proxy on 8883. - Add stacks/mosquitto — reserved for the FROST (SensorThings) stack only. Cloud-only. Internal to its own stack; no external ingress. ML / notebooks (cloud-only) - stacks/mlflow — experiment tracking + model registry. Postgres backend on sql stack; local volume for artifacts (S3/MinIO is a TODO). - stacks/jupyterhub — multi-user notebook server. DockerSpawner via mounted docker.sock; users spawn into cloud-app network so they can reach mlflow, influxdb (via grafana), rabbitmq. Sites - sites/gemaal1 — first edge deployment scaffold. Site-local override template for binding nginx to PLANT_LAN_IP. Docs - README + docs/architecture.md updated: stacks table now lists 15 stacks, ingress + attachment tables reflect mlflow/jupyterhub, TLS strategy section locked, MQTT-split section added, Gitea HTTPS-only noted. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 13:22:46 +02:00
feat(sso): wire Keycloak SSO end-to-end across all apps New stack: - stacks/oauth2-proxy/ — per-app sidecars (mlflow, portainer, rabbitmq) that gate vhosts via nginx auth_request against Keycloak's wbd realm. Native OIDC wired into: - grafana (generic_oauth, role-attribute-path → Admin/Editor/Viewer) - jupyterhub (oauthenticator.GenericOAuthenticator) - node-red (passport-openidconnect; in-memory state store + users() resolver because adminAuth doesn't expose req.session) - jenkins (oic-auth plugin via JCasC; matrix-auth for authz; setup wizard suppressed; custom image with plugins.txt) Infra fixes uncovered while bringing the above online: - nginx-proxy: bump proxy_buffer_size to 16k so oauth2-proxy callbacks don't 502 on the JWT-bearing Set-Cookie header. - nginx-proxy: add `resolver 127.0.0.11 valid=30s` so service names re-resolve after sidecar recreates (was cross-wiring oauth2-proxy upstreams after restart). - jupyterhub: pass --allow-root to the singleuser spawner (hub runs as root inside its container; jupyter-server refused root without flag). - jupyterhub Dockerfile: install jupyterlab + notebook so SimpleLocalProcessSpawner has something to launch. - node-red Dockerfile: install passport-openidconnect into the image so settings.js can require() it. - portainer: pre-seed local admin via --admin-password=<bcrypt-hash> so the 5-minute "no admin → lockout" timer can never trigger. - deploy.sh: restore executable bit (was 644 in repo). Admin/viewer policy: - Created realm role `app-admin` in keycloak wbd realm. - Grafana maps app-admin → Admin (default Viewer). - Jenkins matrix-auth grants r.de.ren Overall/Administer, authenticated users get Overall/Read + Job/Read + View/Read. - Node-RED: NODERED_ADMIN_USERS env list → permissions "*", others ["read"]. (TODO: switch to app-admin realm role.) - JupyterHub: JUPYTERHUB_ADMIN_USERS env list. (Same TODO.) - Gitea: r.de.ren pre-created as local admin; OIDC auto-links via email. Docs: - README, cloud/README, stacks/oauth2-proxy/README, and per-stack READMEs updated to reflect the new state and remove resolved TODOs. - cloud/.env.example gains all the new OIDC client + cookie-secret keys. - cloud/README documents the full kcadm realm bootstrap, including the hardcoded-audience mapper and post-logout redirect URIs that are non-obvious gotchas. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 18:34:37 +00:00
## Auth
feat: SQL=postgres, nginx+certbot, MQTT split, ML stacks, gitea HTTPS-only, gemaal1 site Round-2 changes locking in scaffold-phase decisions and adding ML/notebook stacks. Locked decisions - sql: postgres 16-alpine (was TBD); init.d/ mount for per-app DB provisioning - nginx-proxy: stock nginx + certbot sidecar (was nginx:alpine TODO). Chose stock over nginxproxy/nginx-proxy because stream{} is required for MQTT-TLS reverse-proxy on tcp/8883 to rabbitmq:1883. - gitea: HTTPS-only (DISABLE_SSH=true). No SSH port published. MQTT split - Remove stacks/mqtt placeholder. - Add stacks/rabbitmq — general-purpose broker (AMQP + MQTT plugin), used at both cloud and edge. External MQTT clients reach cloud broker via nginx stream-proxy on 8883. - Add stacks/mosquitto — reserved for the FROST (SensorThings) stack only. Cloud-only. Internal to its own stack; no external ingress. ML / notebooks (cloud-only) - stacks/mlflow — experiment tracking + model registry. Postgres backend on sql stack; local volume for artifacts (S3/MinIO is a TODO). - stacks/jupyterhub — multi-user notebook server. DockerSpawner via mounted docker.sock; users spawn into cloud-app network so they can reach mlflow, influxdb (via grafana), rabbitmq. Sites - sites/gemaal1 — first edge deployment scaffold. Site-local override template for binding nginx to PLANT_LAN_IP. Docs - README + docs/architecture.md updated: stacks table now lists 15 stacks, ingress + attachment tables reflect mlflow/jupyterhub, TLS strategy section locked, MQTT-split section added, Gitea HTTPS-only noted. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 13:22:46 +02:00
feat(sso): wire Keycloak SSO end-to-end across all apps New stack: - stacks/oauth2-proxy/ — per-app sidecars (mlflow, portainer, rabbitmq) that gate vhosts via nginx auth_request against Keycloak's wbd realm. Native OIDC wired into: - grafana (generic_oauth, role-attribute-path → Admin/Editor/Viewer) - jupyterhub (oauthenticator.GenericOAuthenticator) - node-red (passport-openidconnect; in-memory state store + users() resolver because adminAuth doesn't expose req.session) - jenkins (oic-auth plugin via JCasC; matrix-auth for authz; setup wizard suppressed; custom image with plugins.txt) Infra fixes uncovered while bringing the above online: - nginx-proxy: bump proxy_buffer_size to 16k so oauth2-proxy callbacks don't 502 on the JWT-bearing Set-Cookie header. - nginx-proxy: add `resolver 127.0.0.11 valid=30s` so service names re-resolve after sidecar recreates (was cross-wiring oauth2-proxy upstreams after restart). - jupyterhub: pass --allow-root to the singleuser spawner (hub runs as root inside its container; jupyter-server refused root without flag). - jupyterhub Dockerfile: install jupyterlab + notebook so SimpleLocalProcessSpawner has something to launch. - node-red Dockerfile: install passport-openidconnect into the image so settings.js can require() it. - portainer: pre-seed local admin via --admin-password=<bcrypt-hash> so the 5-minute "no admin → lockout" timer can never trigger. - deploy.sh: restore executable bit (was 644 in repo). Admin/viewer policy: - Created realm role `app-admin` in keycloak wbd realm. - Grafana maps app-admin → Admin (default Viewer). - Jenkins matrix-auth grants r.de.ren Overall/Administer, authenticated users get Overall/Read + Job/Read + View/Read. - Node-RED: NODERED_ADMIN_USERS env list → permissions "*", others ["read"]. (TODO: switch to app-admin realm role.) - JupyterHub: JUPYTERHUB_ADMIN_USERS env list. (Same TODO.) - Gitea: r.de.ren pre-created as local admin; OIDC auto-links via email. Docs: - README, cloud/README, stacks/oauth2-proxy/README, and per-stack READMEs updated to reflect the new state and remove resolved TODOs. - cloud/.env.example gains all the new OIDC client + cookie-secret keys. - cloud/README documents the full kcadm realm bootstrap, including the hardcoded-audience mapper and post-logout redirect URIs that are non-obvious gotchas. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 18:34:37 +00:00
Keycloak OIDC via `oauthenticator.generic.GenericOAuthenticator`. All authenticated users in the `wbd` realm can sign in (`c.GenericOAuthenticator.allow_all = True`).
Admin promotion is currently driven by the `JUPYTERHUB_ADMIN_USERS` env (comma-separated emails). Switching to a Keycloak realm-role check (`app-admin`) is a TODO.
## Spawner
**Current**: `SimpleLocalProcessSpawner` ("simple") — every user's notebook runs as a process inside the hub container itself, sharing the same filesystem. The spawner passes `--allow-root` to `jupyterhub-singleuser` because the hub container runs as root and the singleuser server refuses root without that flag.
This is fine for one or two operators but is **not** the production-shape we want.
### TODO — switch to DockerSpawner
The repo wiring is already half-there:
- The `mgmt` network is mounted
- `/var/run/docker.sock` is mounted into the hub
- `DOCKER_NOTEBOOK_IMAGE` is set in `.env`
To switch, change `jupyterhub_config.py`:
```python
c.JupyterHub.spawner_class = "dockerspawner.DockerSpawner"
c.DockerSpawner.image = os.environ["DOCKER_NOTEBOOK_IMAGE"]
c.DockerSpawner.network_name = os.environ["DOCKER_NETWORK_NAME"]
c.DockerSpawner.notebook_dir = "/home/jovyan/work"
c.DockerSpawner.volumes = {"jupyter-user-{username}": "/home/jovyan/work"}
c.DockerSpawner.remove = True
```
…and add `dockerspawner` to the `Dockerfile` pip install.
## Other TODO
- Switch admin lookup from env-list to `app-admin` realm role
- Per-user persistent volume policy + size limits
feat: SQL=postgres, nginx+certbot, MQTT split, ML stacks, gitea HTTPS-only, gemaal1 site Round-2 changes locking in scaffold-phase decisions and adding ML/notebook stacks. Locked decisions - sql: postgres 16-alpine (was TBD); init.d/ mount for per-app DB provisioning - nginx-proxy: stock nginx + certbot sidecar (was nginx:alpine TODO). Chose stock over nginxproxy/nginx-proxy because stream{} is required for MQTT-TLS reverse-proxy on tcp/8883 to rabbitmq:1883. - gitea: HTTPS-only (DISABLE_SSH=true). No SSH port published. MQTT split - Remove stacks/mqtt placeholder. - Add stacks/rabbitmq — general-purpose broker (AMQP + MQTT plugin), used at both cloud and edge. External MQTT clients reach cloud broker via nginx stream-proxy on 8883. - Add stacks/mosquitto — reserved for the FROST (SensorThings) stack only. Cloud-only. Internal to its own stack; no external ingress. ML / notebooks (cloud-only) - stacks/mlflow — experiment tracking + model registry. Postgres backend on sql stack; local volume for artifacts (S3/MinIO is a TODO). - stacks/jupyterhub — multi-user notebook server. DockerSpawner via mounted docker.sock; users spawn into cloud-app network so they can reach mlflow, influxdb (via grafana), rabbitmq. Sites - sites/gemaal1 — first edge deployment scaffold. Site-local override template for binding nginx to PLANT_LAN_IP. Docs - README + docs/architecture.md updated: stacks table now lists 15 stacks, ingress + attachment tables reflect mlflow/jupyterhub, TLS strategy section locked, MQTT-split section added, Gitea HTTPS-only noted. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 13:22:46 +02:00
- CPU / memory limits per user container
feat(sso): wire Keycloak SSO end-to-end across all apps New stack: - stacks/oauth2-proxy/ — per-app sidecars (mlflow, portainer, rabbitmq) that gate vhosts via nginx auth_request against Keycloak's wbd realm. Native OIDC wired into: - grafana (generic_oauth, role-attribute-path → Admin/Editor/Viewer) - jupyterhub (oauthenticator.GenericOAuthenticator) - node-red (passport-openidconnect; in-memory state store + users() resolver because adminAuth doesn't expose req.session) - jenkins (oic-auth plugin via JCasC; matrix-auth for authz; setup wizard suppressed; custom image with plugins.txt) Infra fixes uncovered while bringing the above online: - nginx-proxy: bump proxy_buffer_size to 16k so oauth2-proxy callbacks don't 502 on the JWT-bearing Set-Cookie header. - nginx-proxy: add `resolver 127.0.0.11 valid=30s` so service names re-resolve after sidecar recreates (was cross-wiring oauth2-proxy upstreams after restart). - jupyterhub: pass --allow-root to the singleuser spawner (hub runs as root inside its container; jupyter-server refused root without flag). - jupyterhub Dockerfile: install jupyterlab + notebook so SimpleLocalProcessSpawner has something to launch. - node-red Dockerfile: install passport-openidconnect into the image so settings.js can require() it. - portainer: pre-seed local admin via --admin-password=<bcrypt-hash> so the 5-minute "no admin → lockout" timer can never trigger. - deploy.sh: restore executable bit (was 644 in repo). Admin/viewer policy: - Created realm role `app-admin` in keycloak wbd realm. - Grafana maps app-admin → Admin (default Viewer). - Jenkins matrix-auth grants r.de.ren Overall/Administer, authenticated users get Overall/Read + Job/Read + View/Read. - Node-RED: NODERED_ADMIN_USERS env list → permissions "*", others ["read"]. (TODO: switch to app-admin realm role.) - JupyterHub: JUPYTERHUB_ADMIN_USERS env list. (Same TODO.) - Gitea: r.de.ren pre-created as local admin; OIDC auto-links via email. Docs: - README, cloud/README, stacks/oauth2-proxy/README, and per-stack READMEs updated to reflect the new state and remove resolved TODOs. - cloud/.env.example gains all the new OIDC client + cookie-secret keys. - cloud/README documents the full kcadm realm bootstrap, including the hardcoded-audience mapper and post-logout redirect URIs that are non-obvious gotchas. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 18:34:37 +00:00
- Idle-server culling (`jupyterhub-idle-culler` service)
- Project-specific notebook image with mlflow/influx/rabbitmq clients preinstalled