From 14140725bc527bcdd253a163f57353dad1570d01 Mon Sep 17 00:00:00 2001 From: znetsixe Date: Tue, 26 May 2026 17:32:20 +0200 Subject: [PATCH] =?UTF-8?q?chore:=20workflow=20artifacts=20=E2=80=94=20res?= =?UTF-8?q?earch=20brief=20+=20dashboardAPI=20v2=20PRD=20+=20submodule=20b?= =?UTF-8?q?umps?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Bumps machineGroupControl (e1e1977) and pumpingStation (ef07f2a) — example dashboard JSON tweaks committed on each submodule's development branch. Adds docs/research/ and docs/prd/ for the dashboardAPI v2 graph-aware Grafana generator workflow (Gitea issues #32-#43). Ignores .prototypes/ — throwaway spike code lives there per the /prototype skill. --- .gitignore | 1 + ...hboardapi-graph-aware-grafana-generator.md | 82 +++++++++++++++++++ ...hboardapi-graph-aware-grafana-generator.md | 56 +++++++++++++ nodes/machineGroupControl | 2 +- nodes/pumpingStation | 2 +- 5 files changed, 141 insertions(+), 2 deletions(-) create mode 100644 docs/prd/dashboardapi-graph-aware-grafana-generator.md create mode 100644 docs/research/dashboardapi-graph-aware-grafana-generator.md diff --git a/.gitignore b/.gitignore index 04458d3..4149bed 100644 --- a/.gitignore +++ b/.gitignore @@ -20,3 +20,4 @@ tools/.env .repo-mem/ .codex CLAUDE.local.md +.prototypes/ diff --git a/docs/prd/dashboardapi-graph-aware-grafana-generator.md b/docs/prd/dashboardapi-graph-aware-grafana-generator.md new file mode 100644 index 0000000..34e0a82 --- /dev/null +++ b/docs/prd/dashboardapi-graph-aware-grafana-generator.md @@ -0,0 +1,82 @@ +# dashboardAPI v2 — graph-aware Grafana dashboard generator + +_Date: 2026-05-26 · Owner: R&D · Predecessors: `/grill-me` (in-conversation), [`docs/research/dashboardapi-graph-aware-grafana-generator.md`](../research/dashboardapi-graph-aware-grafana-generator.md)_ + +One `dashboardAPI` node in a Node-RED flow auto-generates one Grafana dashboard by walking its child-registration graph, composing per-node-type panel templates, and pushing the result to Grafana via HTTP on every Node-RED deploy. + +## Problem +Every EVOLV example flow today carries a hand-authored Node-RED Dashboard tab — the active `pumpingstation-complete-example` flow has 73 `ui-*` nodes (charts, gauges, text widgets, fan-out function nodes) consuming roughly a third of the flow. Every new example replicates this work, and each one diverges in axis ranges, chart configs, and fan-out logic — so the output side is inconsistent across the 10+ example flows we maintain. The same telemetry already lands in InfluxDB via Port 1 of every node, so Grafana could render it natively, but today each Grafana dashboard is hand-authored JSON (`docker/grafana/provisioning/dashboards/pumping-station.json` is the only one that exists, frozen at one node type). Result: R&D spends disproportionate time on dashboard plumbing, examples drift, and Grafana — the better readout — is underused. + +## Goals +1. Dropping a `dashboardAPI` node into a flow and deploying produces a complete Grafana dashboard with no hand-authored JSON. +2. Adding a new EVOLV node *instance* (e.g. a new measurement child) to a flow adds its panels on the next deploy with zero Grafana edits. +3. Adding a new EVOLV node *type* requires only a panel template fragment under `nodes/dashboardAPI/src/templates/.json` — no changes to the layout engine. +4. Cross-example consistency: every example flow's Grafana dashboard uses the same panel set, axis conventions, and dashed-bounds rendering for the same node type. +5. Node-RED Dashboard tab in example flows shrinks to control-only widgets (mode select, operator demand, calibration, signal injection). Target: ≤15 `ui-*` nodes per example flow. + +## Non-goals +- Sub-second feedback latency from operator action → Grafana visible state. End-to-end ≤15s is acceptable; faster is not pursued. +- Preserving manual Grafana edits across regenerations. Dashboards are single-source-of-truth from dashboardAPI; manual edits are clobbered on next deploy. +- Per-instance dashboard customization through the Grafana UI. Templates are centralized and code-owned. +- Supporting non-EVOLV (third-party) Node-RED node types as panel sources. +- Live runtime regeneration (no deploy). Regen fires on Node-RED deploy events only. +- Operator (plant-staff) UX. Sole user is R&D until further notice. +- Replacing the InfluxDB write path. dashboardAPI v2 reuses the existing `outputUtils.formatForInflux` + `influxdbFormatter` plumbing unchanged. + +## Users & scenarios +Sole user: EVOLV R&D team (Rene, Pim, Janneke, Sjoerd, Dieke, Pieter). + +1. **New example flow from scratch.** When R&D builds a new example for `rotatingMachine-complete`, they assemble the node graph (pumpingStation + 3 pumps + measurements), drop in one dashboardAPI, connect each top-level parent to it, and deploy. A Grafana dashboard at the dashboardAPI's UID appears within seconds, with rows per parent and panels per child following the centralized templates. +2. **Adding a measurement to an existing flow.** When R&D wires a new measurement node as a child of an existing pumpingStation in `pumpingstation-complete-example` and redeploys, the corresponding pump panel gains a `measured` series next to its `predicted` series. No Grafana edit. +3. **Adding a new EVOLV node type.** When R&D ships a new node type `mixer`, they author `nodes/dashboardAPI/src/templates/mixer.json` (Grafana panel fragment with `${nodeName}` substitution tokens) and bump dashboardAPI's package version. Existing dashboardAPI instances pick up mixer-typed children on next deploy. + +## Requirements + +### Functional +1. **F-1.** `dashboardAPI` shall subscribe to `RED.events.on('flows:started')` and, on each event, inspect `payload.diff` to determine whether any of its own subtree (the dashboardAPI node, its registered children, their registered grandchildren) was affected. If yes, regenerate the dashboard. If no, no-op. +2. **F-2.** On regenerate, `dashboardAPI` shall walk its registered children via `ChildRegistrationUtils.getAllChildren()`, recurse one level per registered child to discover grandchildren, and produce an ordered list `[{softwareType, nodeName, position, children: [...]}, ...]`. +3. **F-3.** For each node in the graph, `dashboardAPI` shall load the matching template at `nodes/dashboardAPI/src/templates/${softwareType}.json` and substitute the placeholders `${nodeName}`, `${nodeId}`, `${parentName}`, `${dashboardUid}` and any child-list placeholders into the panel JSON. +4. **F-4.** The layout engine shall compose templates into a single Grafana dashboard JSON with: one row per top-level child of dashboardAPI; nested rows for grandchildren; sequential `gridPos.y` offsets so panels don't overlap. +5. **F-5.** Parent panels shall **not** repeat metrics that any of their children's templates already emit. The template format declares each panel's `emittedFields` so the composer can filter duplicates from the parent's panel set. +6. **F-6.** For each child node of type `rotatingMachine`, the panel set shall include: `%control`, `flow`, `delta P`, any registered measurement child's measured values, and `efficiency`. Where the node config exposes operating bounds (e.g. min/max flow), those bounds shall be rendered as dashed reference lines (`fieldConfig.custom.lineStyle = {fill: "dash", dash: [10,10]}` via a `byName` override) on the same panel as the act value. +7. **F-7.** For each child of type `measurement` registered to a parent that also emits a `predicted` series for the same quantity, the dashboard shall render two panels side by side (predicted left, measured right). If only `predicted` exists, render the predicted panel only. If only `measured` exists, render the measured panel only. +8. **F-8.** `dashboardAPI` shall POST the assembled dashboard to `POST {grafanaUrl}/api/dashboards/db` with body `{dashboard: , overwrite: true, folderUid: }`, using the configured bearer token in `Authorization: Bearer `. The `dashboard.uid` shall be deterministic from the dashboardAPI node's Node-RED id. +9. **F-9.** On a successful upsert (HTTP 200), `dashboardAPI` shall log the dashboard URL at info level. On failure (non-2xx, timeout, network error), it shall log at error level with the response body and shall **not** retry; the next deploy is the retry mechanism. +10. **F-10.** Each node emitting a value with operating bounds shall write the bounds as additional Influx fields named `.min` and `.max` alongside `` itself. The dashed-line override matches these by suffix. +11. **F-11.** The bearer token shall be stored as a Node-RED encrypted credential, not as a plain `defaults` field. On node startup, if the legacy plain field exists, it is migrated to the credential store and the plain field is cleared, with one info-level log line per migrated instance. +12. **F-12.** `dashboardAPI` shall expose `msg.topic == "regenerate-dashboard"` as a manual trigger that bypasses the diff check and forces a regenerate. + +### Non-functional +- **N-1. Performance.** Dashboard composition (graph walk + template merge + JSON build, excluding HTTP roundtrip) shall complete in <500ms for a flow with up to 50 registered children. +- **N-2. Idempotency.** Running the regenerate path twice in a row with no intervening graph change produces a byte-identical dashboard JSON. +- **N-3. Security.** The bearer token shall never appear in any log line, status update, debug output, or admin endpoint response. Token-bearing HTTP requests shall set TLS verification on when the configured Grafana URL is `https://`. +- **N-4. Observability.** Every regenerate emits a structured log line via the `logger` shared utility with fields: `dashboardUid`, `childCount`, `grandchildCount`, `compositionDurationMs`, `httpStatus`, `outcome ∈ {success, http-error, network-error, no-diff}`. +- **N-5. Backward compatibility.** Existing dashboardAPI instances continue to write to InfluxDB exactly as before. The Grafana-push path is additive and disabled if no `grafanaUrl` is configured. + +## Constraints & dependencies +- **Grafana version pinned.** `docker-compose.yml` shall pin to `grafana/grafana:11.3.0` (or whatever specific minor exists at first-issue time) instead of `latest`. The legacy `POST /api/dashboards/db` endpoint is the target; the Grafana 12 Kubernetes-style API is out of scope. This resolves research **O-3**. +- **Node-RED runtime events.** Depends on `RED.events.on('flows:started')` firing with a `payload.diff` shape (added/changed/removed arrays) — undocumented but stable in current Node-RED versions. Verified by prototype before first issue ships. +- **InfluxDB write path unchanged.** Reuses existing `outputUtils.formatForInflux` + `influxdbFormatter`. No schema migration to existing telemetry. +- **Tag schema.** Every Influx field used by a panel must be in the existing emission convention (`_measurement = nodeName`, `_field = type.variant.position.childId`). +- **Scaffolding to reuse:** `ChildRegistrationUtils.getAllChildren()` (`nodes/generalFunctions/src/helper/childRegistrationUtils.js:104-106`), `extractChildren()` (`nodes/dashboardAPI/src/specificClass.js:151-163`), `grafanaUpsertUrl()` (`:107-110`, URL builder exists, HTTP send missing), `BaseNodeAdapter` lifecycle pattern. +- **No new npm dependencies** for the HTTP path. Use Node's built-in `https`/`http` modules. + +## Success metrics +1. **Hand-authored Grafana JSON in repo = 0.** Measured by counting JSON files in `docker/grafana/provisioning/dashboards/` minus the dynamically-uploaded ones. Current: 2 (pumping-station.json, coresync-frost-demo.json). Target after rollout: 0 file-based, N dynamic. +2. **`ui-*` node count per example flow ≤ 15** (down from 73 in the current `pumpingstation-complete-example`). Measured by grepping `examples/*.flow.json` after migration. +3. **Time-to-first-dashboard for a new example flow ≤ 1 minute of human work** (drop in dashboardAPI, configure URL + token, deploy). Measured by stopwatch on the next example flow that gets built. +4. **Regression coverage:** every example flow's dashboard URL returns HTTP 200 and renders without panel errors. Measured by an integration test that hits the Grafana API after deploying each example. + +## Open questions +- **O-1. `flows:started` + `diff` reliability across deploy modes.** Source-readable but needs a spike to confirm `diff` cleanly distinguishes "this dashboardAPI's subtree changed" from "an unrelated flow changed", across `full` / `nodes` / `flows` deploy types. → Resolved by `/prototype` before issue I-3 (the lifecycle hook issue) starts. +- **O-2. Dashed-line `custom.lineStyle` rendering against real Influx series.** Open Grafana bugs [#75259](https://github.com/grafana/grafana/issues/75259) and [#86546](https://github.com/grafana/grafana/issues/86546) may affect us. → Resolved by `/prototype` before issue I-5 (rotatingMachine template) starts. +- **O-5 (new).** Folder UID handling — does dashboardAPI assume a single Grafana folder for all generated dashboards (configured per-instance), or create per-flow folders? Default: per-instance configured folder UID, optional. If empty, dashboards land in the General folder. → Owner: R&D, deadline: before I-4. + +## Out of scope (v2 candidates) +- Per-instance panel customization through the Grafana UI with merge-on-regen. +- Operator-facing UX (Grafana role/permission management, embedded dashboards in Node-RED). +- Auto-discovery of measurement units / axis ranges from node config schemas. +- Multi-Grafana-instance fanout (push the same dashboard to staging + prod). +- Grafana alerts / notification policies generated from EVOLV alarm definitions. +- Dashboard versioning / rollback inside Grafana. +- Template fragments living next to their owning node (decentralized template discovery). diff --git a/docs/research/dashboardapi-graph-aware-grafana-generator.md b/docs/research/dashboardapi-graph-aware-grafana-generator.md new file mode 100644 index 0000000..3b6ecb0 --- /dev/null +++ b/docs/research/dashboardapi-graph-aware-grafana-generator.md @@ -0,0 +1,56 @@ +# Research brief: graph-aware Grafana dashboard generator in dashboardAPI + +_Date: 2026-05-26_ +_Context: follows `/grill-me` session that locked design constraints; feeds into `/prd`._ + +## Questions +1. Node-RED lifecycle: how does a custom node reliably detect "deploy complete" across deploy types? +2. Prior art: existing Node-RED → Grafana auto-dashboard generators +3. Grafana HTTP API: idempotent dashboard updates by UID, version conflicts, RBAC +4. Dynamic min/max envelope pattern: dashed reference lines that vary over time +5. EVOLV-internal scaffolding already in place + +## Design constraints already settled in `/grill-me` +1. dashboardAPI = dashboard **generator**, not just an InfluxDB writer. +2. One dashboardAPI instance = one Grafana dashboard. Multiple instances coexist. +3. Single source of truth: regen on Node-RED deploy **clobbers** manual Grafana edits. +4. Trigger: HTTP API push from dashboardAPI to Grafana, fired on Node-RED deploy. +5. Auth: per-flow Grafana service-account token. +6. Templates centralized in `nodes/dashboardAPI/src/templates/` per node type. +7. Per-instance `_measurement` = node name (already in `influxdbFormatter`). +8. **No data duplication** between parent and child panels (MGC shows group-level only). +9. Predicted-vs-measured = 2 panels side by side; predicted only when no measured registered. +10. Per-pump panel set: %control / flow / delta P / measured-from-children / efficiency / dashed dynamic bounds. +11. Static config bounds → **dashed reference lines** that follow the live operating envelope (top/bottom dashed + act value). + +## What's already in this codebase +- **Child registration is fully graph-aware.** `ChildRegistrationUtils` keeps a `Map` with type-aware accessors `getAllChildren()`, `getChildById()`, `getChildrenOfType()`. (`nodes/generalFunctions/src/helper/childRegistrationUtils.js:19-106`) +- **dashboardAPI already iterates its children.** `extractChildren()` reads `nodeSource.childRegistrationUtils.registeredChildren.values()`. (`nodes/dashboardAPI/src/specificClass.js:151-163`) +- **Grafana upsert URL is already constructed but not yet dispatched.** `grafanaUpsertUrl()` builds the target URL — the HTTP send is missing. (`nodes/dashboardAPI/src/specificClass.js:107-110`) +- **InfluxDB schema is `measurement: nodeName`, tags from flattened config** (id, softwareType, role, positionVsParent, uuid, tagCode, geoLocation, category, type, model, unit). (`nodes/generalFunctions/src/helper/outputUtils.js:44,99-117`; `formatters/influxdbFormatter.js:12-20`) +- **Lifecycle hooks: only `node.on('close')` and `node.on('input')` are used.** No EVOLV node currently subscribes to `RED.events.on('flows:started')` or similar — net-new wiring. (`nodes/generalFunctions/src/nodered/BaseNodeAdapter.js:164,184`) +- **dashboardAPI's bearer token is stored as a plain `defaults` field, NOT as a Node-RED `credentials:` block** — so it's not encrypted at rest today. (`nodes/dashboardAPI/dashboardAPI.html:15-16`; `src/nodeClass.js:38-42`) **Contradicts the grilling assumption** that "the existing InfluxDB credentials path" is already in place — it isn't. +- **No outbound external HTTPS pattern exists anywhere in EVOLV nodes.** Net-new code path. + +## External options +- **Legacy Grafana API (`POST /api/dashboards/db` with `overwrite: true`).** Skips version + uid-uniqueness checks → idempotent. Returns `412 Precondition Failed` on stale version when `overwrite=false`. Minimum RBAC: `dashboards:write` scoped to a folder. ([docs](https://grafana.com/docs/grafana/latest/developers/http_api/dashboard/)) +- **Grafana 12 Kubernetes-style API (`/apis/dashboard.grafana.app/v1/...`).** Returns `409 Conflict` instead of `412`. Newer but couples integration to Grafana 12+. +- **`flows:started` runtime event** fires on every deploy (full / nodes / flows) with `{type, diff}` payload. De-dupe by inspecting `diff.added/changed/removed`. Runtime events are undocumented — must read source. (Node-RED `packages/.../runtime/lib/flows/index.js`) +- **`nodes-started` event is deprecated** — use `flows:started`. +- **Dashed-line dynamic bands:** the *only* path that works today is emitting min/max as separate Influx fields + applying `fieldConfig.overrides[].properties[].id = "custom.lineStyle"` with `{fill: "dash", dash: [10,10]}`. Per-series override via `byName` matcher. +- **Grafana thresholds are static-only** (open issue [grafana/grafana#115398](https://github.com/grafana/grafana/issues/115398) — Needs Prioritisation). Dead end for time-varying bands. + +## Prior art +- **No relevant prior art found.** Every "node-red + grafana" tutorial puts Influx in the middle and hand-builds dashboards. No npm package pushes Grafana dashboards from Node-RED. Greenfield lane. +- **Grafana Foundation SDK / dashboards-as-code** ([docs](https://grafana.com/docs/grafana/latest/as-code/observability-as-code/foundation-sdk/)) — assumes out-of-band CI generation, not a live Node-RED instance. +- **Operating-envelope plotting in Grafana** — [community thread 57225](https://community.grafana.com/t/how-to-plot-graph-using-upper-and-lower-bound/57225) asks the exact question, no accepted answer. +- **Known Grafana bugs around `custom.lineStyle`:** [#75259](https://github.com/grafana/grafana/issues/75259) (transforms) and [#86546](https://github.com/grafana/grafana/issues/86546) (overlapping dashed → solid). + +## Open unknowns +- **(O-1) `flows:started` + `diff` reliability.** Does `diff` cleanly distinguish "this dashboardAPI's flow changed" from "an unrelated flow changed" across all three deploy modes? Source-readable but needs an actual spike to verify edge cases (e.g. a `Modified Nodes` deploy that adds a child measurement to a pumpingStation registered to a dashboardAPI in a different tab). → **Candidate for `/prototype`.** +- **(O-2) Dashed-line rendering against real Influx series.** Two open Grafana bugs ([#75259](https://github.com/grafana/grafana/issues/75259), [#86546](https://github.com/grafana/grafana/issues/86546)) affect `custom.lineStyle`. Untested whether either bites with EVOLV's emission pattern. → **Candidate for `/prototype`.** +- **(O-3) Legacy `/api/dashboards/db` vs v12 K8s API.** Which to commit to? Locks integration to a Grafana version family. Local stack uses `grafana/grafana:latest` — version drifts on `docker compose pull`. → PRD-time decision; pin Grafana image. +- **(O-4) Bearer-token storage migration.** Assumption that "follow existing creds pattern" doesn't hold — dashboardAPI stores it as plain config today. Need to migrate to Node-RED `credentials:` block. Risk: token currently sitting in `flow.json` of users' existing flows. → PRD-time decision; migration step in first issue. + +## Recommended next step +`/prd` — commit the design, resolve O-3 and O-4 explicitly, and queue O-1 and O-2 for `/prototype` before the first issue ships. diff --git a/nodes/machineGroupControl b/nodes/machineGroupControl index ddf2b07..e1e1977 160000 --- a/nodes/machineGroupControl +++ b/nodes/machineGroupControl @@ -1 +1 @@ -Subproject commit ddf2b07424d9df63dbc367e0bb70c819c113afaa +Subproject commit e1e19771393f661bc9237c55b07bd5ff2866dcc5 diff --git a/nodes/pumpingStation b/nodes/pumpingStation index 2d68a4f..ef07f2a 160000 --- a/nodes/pumpingStation +++ b/nodes/pumpingStation @@ -1 +1 @@ -Subproject commit 2d68a4f504acef267c71deabdbb2ccb6b7b85aae +Subproject commit ef07f2a5b2a6d8ed1cb963845ece7694f11b398d