Files
infra/stacks/mlflow/README.md

13 lines
737 B
Markdown
Raw Normal View History

feat: SQL=postgres, nginx+certbot, MQTT split, ML stacks, gitea HTTPS-only, gemaal1 site Round-2 changes locking in scaffold-phase decisions and adding ML/notebook stacks. Locked decisions - sql: postgres 16-alpine (was TBD); init.d/ mount for per-app DB provisioning - nginx-proxy: stock nginx + certbot sidecar (was nginx:alpine TODO). Chose stock over nginxproxy/nginx-proxy because stream{} is required for MQTT-TLS reverse-proxy on tcp/8883 to rabbitmq:1883. - gitea: HTTPS-only (DISABLE_SSH=true). No SSH port published. MQTT split - Remove stacks/mqtt placeholder. - Add stacks/rabbitmq — general-purpose broker (AMQP + MQTT plugin), used at both cloud and edge. External MQTT clients reach cloud broker via nginx stream-proxy on 8883. - Add stacks/mosquitto — reserved for the FROST (SensorThings) stack only. Cloud-only. Internal to its own stack; no external ingress. ML / notebooks (cloud-only) - stacks/mlflow — experiment tracking + model registry. Postgres backend on sql stack; local volume for artifacts (S3/MinIO is a TODO). - stacks/jupyterhub — multi-user notebook server. DockerSpawner via mounted docker.sock; users spawn into cloud-app network so they can reach mlflow, influxdb (via grafana), rabbitmq. Sites - sites/gemaal1 — first edge deployment scaffold. Site-local override template for binding nginx to PLANT_LAN_IP. Docs - README + docs/architecture.md updated: stacks table now lists 15 stacks, ingress + attachment tables reflect mlflow/jupyterhub, TLS strategy section locked, MQTT-split section added, Gitea HTTPS-only noted. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 13:22:46 +02:00
# mlflow
MLflow tracking server + model registry. Used by data scientists running experiments from JupyterHub or local laptops. **Cloud-only.**
- **Networks**: `app` (UI on port 5000, reverse-proxied at `/mlflow` or subdomain) + `data` (postgres backend in `sql`)
- **Backend store**: postgres database `mlflow` — must be provisioned by `sql/config/init.d/`
- **Artifact store**: local volume `mlflow-artifacts`. Switch to S3/MinIO when artifact volume grows beyond a few GB.
- **TODO**:
- Provision `mlflow` DB + role in `sql` init scripts
- Keycloak OIDC via nginx `auth_request` (MLflow has no native auth — must front-end it)
- MinIO sidecar for S3-compatible artifact store
- Retention / cleanup policy for stale runs