merge: integrate origin/development into dev-lzm

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
chore(submodule): bump pumpingStation ef07f2a -> 2fb083d
2026-05-29 15:49:14 +02:00 · 2026-05-29 13:59:28 +02:00 · 2026-05-26 17:32:20 +02:00 · 2026-05-23 15:30:45 +02:00 · 2026-05-22 20:27:59 +02:00 · 2026-05-21 16:33:26 +02:00
13 changed files with 1682 additions and 0 deletions
--- a/.claude/skills/README.md
+++ b/.claude/skills/README.md
@@ -0,0 +1,217 @@
 # Workflow skills — research → prototype → grill-me → prd → prd-to-issues → ship-it
 A six-skill chain that takes a vague idea from "I wonder if we could…" to merged, end-to-end-verified code. Four collaborative phases up front to lock down what's being built; two largely-autonomous phases that execute against that contract.
 ```
 /research <topic>      MOSTLY      external + repo knowledge into a brief
       ↓
 /prototype <claim>     MOSTLY      throwaway spike to test the riskiest assumption
       ↓
 /grill-me <topic>      TOGETHER    pressure-test what survived
       ↓
 /prd                   TOGETHER    synthesize PRD; gaps stay explicit
       ↓
 /prd-to-issues         MOSTLY      thin vertical-slice issues; file on "create"
       ↓
 /ship-it               AFK         shell loop ships every slice end-to-end
 ```
 You don't have to use every skill on every feature. Small tweaks may skip `/research` and `/prototype`. Bigger / novel work uses the whole chain.
 ## Mode taxonomy
 | Mode | Meaning | Skills |
 |---|---|---|
 | **TOGETHER** | Needs your turn-by-turn judgment. No autonomous path. | grill-me |
 | **MOSTLY TOGETHER** | Drafts / fetches / builds AFK. You review the output. Any visible-to-team action needs your explicit "go". | research, prototype, prd, prd-to-issues |
 | **AFK** | No human in the loop. Logs questions to issues instead of asking. | ship-it |
 The chain is structured so AFK execution only starts after the human-locked phases have nailed down the contract. Autonomous code never runs against undefined contracts.
 ---
 ## When to use each
 ### `/research <topic>` — MOSTLY TOGETHER
 Fans out Explore + WebSearch agents in parallel, synthesizes findings into a research brief, and names open unknowns explicitly (which become candidates for `/prototype`).
 **Use when:** the topic touches anything you haven't done before in this codebase. Novel libraries, unfamiliar patterns, "how do others solve this".
 **Don't use it for:** stuff you already understand. The point is to fetch what you don't know, not summarize what you do.
 ### `/prototype <claim>` — MOSTLY TOGETHER
 Builds a throwaway spike to test ONE falsifiable assumption. Code lives in `.prototypes/` (gitignored) and is never promoted to the main codebase. Output is evidence — verdict, numbers, observed behavior — that feeds the PRD.
 **Use when:** `/research` surfaced an Open Unknown that "we'll find out when we build it". Better to find out for an hour of spike cost than a week of half-built feature.
 **Don't use it for:** building a "lightweight v0" you secretly plan to evolve. Prototypes are evidence; production code is the real implementation. The skill rejects scope creep mid-spike.
 ### `/grill-me <topic>` — TOGETHER
 Senior staff engineer running a brutal-but-fair interview. One hard question at a time, honest critique (no praise filler), drills into weak spots. Stay on topic until exhausted; say `stop` for an honest 3-bullet debrief.
 **Use when:** `/research` and `/prototype` (if used) have built up enough context that you need to pressure-test your own thinking before locking it in a PRD.
 **Don't use it for:** rubber-stamping a finished idea, or when you want validation. Designed to find gaps, not to agree.
 ### `/prd` — TOGETHER (drafts AFK after grilling)
 Engineering PRD: Problem, Goals, Non-goals, Users & scenarios, Functional + Non-functional requirements, Constraints, Success metrics, Open Questions, Out of scope. Things you nailed in grilling become firm requirements. Things you hedged become Open Questions with the specific gap named — gaps don't get papered over.
 **Use when:** the grilling exposed enough that the feature shape is clear. Or standalone when you already have full context.
 **Don't use it for:** strategy decks, market sizing, "why now". This is for engineering.
 ### `/prd-to-issues` — MOSTLY TOGETHER
 Breaks the PRD into **thin vertical slices** — each issue cuts end-to-end through every integration layer (schema → service → API → UI → tests; or sensor → broker → parser → store → dashboard). First slice is a walking skeleton. Prerequisites get absorbed into the slice that needs them, not filed separately. Per-issue `Slice check` block proves every layer is covered, plus a coverage matrix at the top of the draft showing PRD → issue mapping. Self-audit runs **before** the draft is shown to you.
 **Output:** draft inline → you reply `create` → files to the tracker (`gh` for GitHub, `tea` for Gitea).
 **Don't use it for:** horizontal task lists ("DB work", "API work", "frontend work"). The skill rejects layer-cake slicing.
 ### `/ship-it` — AFK
 Shell loop in `.claude/skills/ship-it/loop.sh`. Picks the next ready issue, dispatches a fresh headless Claude to ship it end-to-end (failing e2e test first → implement layer by layer → full suite → outermost-layer smoke check → commit `Closes #N` → PR with acceptance-criteria checkboxes + smoke evidence → CI gate → merge or leave-for-review), then moves on. One commit per issue. Status streams to terminal; tail logs from another shell; Ctrl-C anytime.
 Undecidable issues get labeled `needs-decision` and skipped. Three consecutive failures stops the loop for human review.
 **Don't use it for:** issues whose acceptance criteria aren't testable. The loop will skip them.
 ---
 ## A worked example
 Adding live sensor display for a new flow meter to the operator dashboard.
 ```bash
 # 1. Fetch what we don't already know
 /research adding live flow-meter readings to the operator dashboard
 # Brief lands; surfaces an Open Unknown:
 #   "Can Node-RED sustain 1Hz updates across 12 dashboard panels for 10 min
 #    straight without dropping frames?"
 # 2. Test the risky assumption
 /prototype Node-RED can stream 1Hz updates to 12 Grafana panels for 10 min straight
 # Spike runs in .prototypes/nodered-throughput/;
 # Verdict: confirmed, 14% CPU peak. Evidence captured. Prototype stays gitignored.
 # 3. Pressure-test the design
 /grill-me adding live flow-meter readings to the operator dashboard
 # 6–8 hard questions; surfaces gaps in alerting and missing-data handling.
 # 4. Lock down the contract
 /prd
 # PRD drafts with the alerting decision as a firm requirement and the
 # missing-data behavior as an explicit Open Question.
 # 5. Slice it
 /prd-to-issues
 # 5 slices; coverage matrix confirms every PRD requirement maps to a slice.
 # Reply `create` → issues #142..#146 filed.
 # 6. Walk away
 /ship-it
 # Preflight, plan, "Start? Reply `go`." → `go` → shell loop runs.
 # Another terminal:  tail -f .ship-it-logs/run-*.log
 ```
 After ship-it exits, the summary tells you what shipped, what's open for review, what hit `needs-decision`.
 ## Skipping skills
 The chain is a default, not a mandate:
 - **Tiny well-understood change:** straight to `/prd-to-issues` (or file the issue by hand and run `/ship-it`).
 - **Bigger but stack-familiar:** skip `/research` and `/prototype`; start at `/grill-me`.
 - **Pure research, no implementation yet:** stop after `/research` or `/prototype` — the brief or findings are the deliverable.
 - **Existing PRD from somewhere else:** `/prd-to-issues <path>` and go.
 ---
 ## File layout
 ```
 .claude/skills/
 ├── README.md                ← this file
 ├── research/
 │   └── SKILL.md
 ├── prototype/
 │   └── SKILL.md
 ├── grill-me/
 │   └── SKILL.md
 ├── prd/
 │   └── SKILL.md
 ├── prd-to-issues/
 │   └── SKILL.md
 └── ship-it/
    ├── SKILL.md             ← entry point; chat-side bootstrap
    ├── loop.sh              ← orchestrator (the actual loop)
    └── iterate.md           ← per-issue prompt the loop dispatches
 .prototypes/                 ← throwaway spike code (gitignored, created by /prototype)
 .ship-it-logs/               ← ship-it loop logs (recommend gitignoring)
 docs/
 ├── research/                ← saved research briefs ("save it" in /research)
 └── prd/                     ← saved PRDs ("save it" in /prd)
 ```
 ---
 ## Configuration
 ### ship-it tracker support
 - **GitHub** — `gh` CLI required (`gh auth status`)
 - **Gitea** — `tea` CLI required (`go install code.gitea.io/tea@latest && tea login add`)
 - Auto-detected from `git remote get-url origin`
 ### ship-it env vars
 | Var | Default | Purpose |
 |---|---|---|
 | `SHIP_IT_TRUNK` | `main` | Trunk branch (set to `development` for the EVOLV repo) |
 | `SHIP_IT_MAX` | 50 | Iteration cap |
 | `SHIP_IT_MAX_FAIL` | 3 | Consecutive failures before stop |
 | `SHIP_IT_TIMEOUT` | 30m | Per-issue timeout |
 | `SHIP_IT_LOG_DIR` | `<repo>/.ship-it-logs` | Log directory |
 Example for EVOLV:
 ```bash
 SHIP_IT_TRUNK=development bash .claude/skills/ship-it/loop.sh
 ```
 ### Issue label expected by ship-it
 The loop filters to open issues with label `slice` and without `blocked`, `needs-decision`, or `ci-failed`. `/prd-to-issues` applies `slice` by default. If you file issues by hand, add the label or ship-it won't pick them up.
 ---
 ## Troubleshooting
 **`ship-it` won't start: "tea CLI not installed".**
 Repo remote is Gitea but you don't have `tea`. Install it (`go install code.gitea.io/tea@latest && tea login add`) or push to a GitHub mirror.
 **`ship-it` exits immediately: "git tree is dirty".**
 Commit or stash before running. The loop won't risk mixing WIP into a slice.
 **`ship-it` says "backlog empty" but I have open issues.**
 Filter requires label `slice` AND none of `blocked` / `needs-decision` / `ci-failed`. Check labels.
 **An issue keeps getting `needs-decision`.**
 Acceptance criteria probably aren't testable at the outermost layer. Rewrite as observable (e.g. "POST /x returns 201 and row appears on dashboard"), drop the label, rerun.
 **`/prototype` keeps wanting to "tidy up" the spike before reporting.**
 That's a sign the assumption isn't sharp enough — Claude is filling time. Sharpen the assumption and rerun, or just say "stop, report what you have now."
 **`/research` returns shallow results.**
 The decomposed questions were too broad. Ask it to redo with a tighter scope, or constrain ("only this repo" / "only Node-RED + InfluxDB stack").
 **`/prd-to-issues` drafts look like layer cake.**
 Stop, say "reslice — these are horizontal." The skill's self-audit should catch this, but if it doesn't, push back explicitly.
 ---
 ## Design principles
 - **Front-load gap discovery.** Research, prototype, grill-me, PRD — each phase exists to surface gaps before they cost real implementation time.
 - **Gaps are explicit, never hidden.** Open Unknowns in `/research` → spike claims in `/prototype` → Open Questions in PRD → `needs-decision` labels on issues. Nothing gets papered over.
 - **Vertical slices, always.** No "implement the backend first". Every slice exercises every layer.
 - **AFK only after the contract is locked.** Autonomous code only runs against decisions already on paper.
 - **Throwaway means throwaway.** Prototypes are evidence; the real implementation in production code happens fresh in `/ship-it`.
 - **Outermost-layer verification.** "Tests pass" isn't enough — the loop confirms user-observable behavior actually works before reporting shipped.
 - **One commit per slice.** Small, reviewable, revertible.
--- a/.claude/skills/grill-me/SKILL.md
+++ b/.claude/skills/grill-me/SKILL.md
@@ -0,0 +1,43 @@
 ---
 name: grill-me
 description: Run a technical interview-style grilling on a topic the user names. Ask hard questions one at a time, wait for the user's answer, critique honestly, then drill deeper into weak spots. Use when the user invokes /grill-me or asks to be "grilled", "quizzed hard", or "interviewed" on a technical topic.
 ---
 # Grill Me — Technical Interview Mode
 **Mode: TOGETHER (human-in-the-loop).** Every turn waits for the user's answer. There is no autonomous path through this skill — without the user replying, there is nothing to grill. Do not try to predict their answers or batch questions to "save time".
 You are now a senior staff engineer running a brutal but fair technical interview. The user wants to be tested, not coddled. Treat them like a strong candidate you respect enough to push.
 ## How to behave
 1. **One question at a time.** Never ask multiple questions in a single turn. Wait for the answer before continuing.
 2. **Adapt difficulty live.** Open at the level the user names (or mid-level if unspecified). If they nail it cleanly, raise the bar next turn. If they fumble, drill into the specific gap before moving on — don't pity-advance.
 3. **Critique honestly.** No "great answer!" filler. If the answer is wrong, say so plainly and explain why. If it's partially right, name exactly what's missing. If it's strong, say "solid" in one line and move on — don't pad.
 4. **Follow the gap.** When an answer reveals a weak spot (vague hand-waving, wrong mental model, missing edge case), your next question targets that spot directly. Do not let the user route around weakness.
 5. **No leading questions.** Don't telegraph the answer in the question. "What does the GIL do?" not "Why does the GIL prevent true parallelism in CPython?"
 6. **Demand specifics.** If they say "it's faster," ask how much and why. If they say "the database handles it," ask which guarantee and at what isolation level. Push past buzzwords.
 7. **End on demand.** When the user says "stop", "done", or "enough", give a 3-bullet honest debrief: what they nailed, what was shaky, what to study next. No participation trophies.
 ## Question quality bar
 - Real interview questions, not trivia. Prefer "design X under Y constraint" or "this code has a bug — find it" over "what does keyword Z mean".
 - Mix categories across the session: fundamentals → system design → debugging → tradeoff judgment.
 - Include at least one question per session where the *honest* answer is "it depends" — and grill them on what it depends on.
 - For code/design questions, give just enough context to answer. Don't write essays in the question.
 ## Session flow
 **First turn:** If the user provided a topic with the invocation (e.g. `/grill-me distributed systems`), start immediately with question 1 on that topic. If no topic, ask: "What do you want to be grilled on, and at what level (junior / mid / senior / staff)?" Then wait.
 **Each subsequent turn:** Critique the previous answer in 1–3 sentences, then ask the next question. That's it. No recap, no preamble.
 **On request to stop:** Deliver the debrief and exit interviewer mode.
 ## What not to do
 - Don't give hints unless the user explicitly asks ("hint please" / "I'm stuck"). Even then, give the smallest hint that unblocks.
 - Don't switch topics randomly. Stay on the thread until it's exhausted or the user changes it.
 - Don't break character with meta-commentary like "as an AI" or "I'll now ask…". Just ask.
 - Don't grade on a curve. A staff-level question gets staff-level scrutiny regardless of how the user is doing.
--- a/.claude/skills/prd-to-issues/SKILL.md
+++ b/.claude/skills/prd-to-issues/SKILL.md
@@ -0,0 +1,169 @@
 ---
 name: prd-to-issues
 description: Break a PRD down into thin vertical-slice issues — each one cuts end-to-end through every integration layer so it can be demoed and tested on its own, instead of integrating layer-by-layer. Designed to follow a /prd session. Drafts inline first; only creates issues in the tracker after explicit user confirmation. Use when the user invokes /prd-to-issues, asks to "turn the PRD into issues", "create tickets", "slice this into stories", or "file these as issues".
 ---
 # PRD → Issues
 **Mode: MOSTLY TOGETHER.** Drafting and the self-audit can run AFK. But filing issues is visible to teammates, so the create step *always* requires an explicit "create" / "file them" from the user. Drafting and showing the list does not count as approval.
 You are now a tech lead translating a PRD into a backlog of **thin vertical slices**. The job is to produce issues an engineer can pick up, ship end-to-end, and demo — without coming back to ask "what does this mean", and without waiting for a separate team to finish a horizontal layer first.
 ## Core principle: vertical slices, not layers
 Every issue must cut through **all** the integration layers the feature touches — even if the slice is laughably narrow on each layer. The first slice is a **walking skeleton**: the thinnest possible path from input to output that exercises every layer, so you discover integration problems on day one instead of week four.
 What this looks like in practice depends on the stack. Examples:
 - **Web feature:** schema migration (one column) + service method (one case) + API endpoint (happy path only) + UI element (one button, one state) + one integration test that hits all of it. Not: "issue 1: schema, issue 2: service, issue 3: API, issue 4: UI".
 - **Data pipeline (this repo's style):** sensor/source config + MQTT topic + Node-RED parse function (one measurement) + InfluxDB write + Grafana panel (one chart) — all wired up for a single signal end-to-end. Not: "issue 1: all MQTT topics, issue 2: all parse functions, issue 3: all dashboards".
 - **Infra:** one service + its compose entry + reverse-proxy route + TLS + a smoke-test curl that returns 200 — all in one issue. Not: "issue 1: compose, issue 2: nginx, issue 3: certs".
 After the walking skeleton, subsequent slices **deepen** one user-visible behavior at a time (next measurement, next edge case, next UI state), still cutting through all layers each time.
 ## Inputs
 In order of preference:
 1. A PRD already drafted in the current conversation (typical case — the `/prd` skill just ran).
 2. A path the user passed: `/prd-to-issues docs/prd/foo.md`.
 3. If neither, ask once: "Point me at the PRD (path or paste it)."
 Do not invent a PRD. If there's nothing to work from, stop and ask.
 ## Tracker detection
 Check the git remote of the current repo to pick the right tool:
 - `github.com` → use `gh issue create` (already on user's allowlist).
 - `gitea.*` or any other Gitea host → use `tea issues create` if available; otherwise prompt the user to file manually or hit the Gitea API with `curl` (requires a token — ask first).
 - No remote / detached → draft only, do not offer to create.
 Run `git remote get-url origin` to detect. Mention the detected tracker in your draft preamble so the user can correct it.
 ## How to slice it
 One issue per **demoable end-to-end behavior**. Work through the PRD this way:
 1. **Identify the layers.** From the PRD, list every integration layer this feature touches (e.g. DB → service → API → UI → tests; or sensor → broker → parser → store → dashboard). Write this list in the draft preamble so the user can sanity-check it.
 2. **Pick the first slice = walking skeleton.** The simplest user-observable behavior that exercises every layer. One signal, one happy path, one button. It should feel embarrassingly small. That's correct.
 3. **Order the rest by depth, not by layer.** Each subsequent slice adds one new user-visible behavior or one new edge case, still cutting through all layers. Examples of "next slice":
   - The same flow for a second input type (second measurement, second user role, second file format).
   - An error case made visible end-to-end (validation error → API 4xx → UI shows it).
   - A non-functional bar made observable end-to-end (add the metric, the alert, and the dashboard tile in one slice).
 4. **Absorb prerequisites into the slice that needs them.** A schema migration, a new dependency, a config change — these ride along inside the first slice that requires them, scoped to *just* what that slice needs. They are not separate "infra issues" filed ahead of time.
 5. **Open Questions from the PRD** → separate **spike** issues, timeboxed (default 1 day), with definition-of-done = "decision documented in [link]". Spikes are the one exception to the vertical-slice rule because they exist to remove unknowns, not to ship behavior.
 6. **Out-of-scope items** → do **not** file. Mention once in the preamble as "explicitly skipped per PRD".
 Right-size: if a slice would take >3 days of focused work, it's not thin enough — narrow the behavior (one signal instead of three, one happy path instead of all error cases) rather than splitting it horizontally. If you find yourself wanting to write "issue 1: backend, issue 2: frontend", stop and reslice.
 ## Issue format
 Each issue is:
 ```
 ### <number>. <title>
 **Title:** <imperative, ≤72 chars. Names the end-to-end behavior, not the layer. "Show live flow rate on dashboard for FT-001" not "Add InfluxDB write for flow sensors">
 **Labels:** <comma-separated. Suggest from: slice, spike, infra, docs, blocked, good-first-issue>
 **Depends on:** <issue numbers in this list, or "none". Most slices should be "none" — if everything depends on slice 1, that's a smell that slice 1 is doing too much>
 **Estimate:** <S / M / L — S=½ day, M=1–2 days, L=3 days. Anything >L means reslice thinner, not split horizontally>
 **Slice — layers touched**
 <One line listing every layer this issue crosses, e.g. `schema → ingest service → API → UI → integration test`. Confirms the slice is actually vertical. If the list has only one layer, this isn't a slice — go back and reframe.>
 **Context**
 <1–3 sentences. Why this exists, linking back to the PRD section. Don't restate the whole PRD.>
 **Scope**
 - <bullet of what's in — phrased as behavior, not tasks. "Posting valid form persists row and shows success toast" not "write controller method">
 - <bullet of what's in>
 **Out of scope**
 - <bullet — call out the next slice that *will* handle the thing you're deferring, so reviewers see it's not forgotten. Skip the block only if there's no real risk of scope creep.>
 **Acceptance criteria**
 - [ ] <end-to-end testable criterion — observable at the outermost layer. "Hitting POST /x with body Y returns 201 and the new row appears on the dashboard within 5s" beats "row exists in table">
 - [ ] <testable criterion>
 - [ ] <testable criterion>
 **Slice check** ✓ / ⚠
 <One short block per issue that you fill in yourself before presenting. Walk the layer inventory and mark each layer as covered or deferred. Example:
 - schema: ✓ adds `flow_rate` column
 - ingest service: ✓ parses one MQTT topic
 - API: ✓ GET /sensors/FT-001 returns latest reading
 - UI: ✓ dashboard tile shows value, auto-refresh 5s
 - integration test: ✓ end-to-end happy path
 - alerting: ⚠ deferred to slice #4 (out of scope, by design)
 If any layer from the inventory is neither covered nor explicitly deferred to a named later slice, mark the issue with ⚠ overall and fix it before presenting. The user sees this block — it's the visible proof the slice is complete.>
 **Notes** (optional)
 <Pointers to files, prior art, gotchas surfaced during /grill-me. Skip if nothing useful.>
 ```
 Quality bar:
 - Acceptance criteria must be checkable by reading them. "Works correctly" is not a criterion; "POST /foo with body X returns 201 and persists row in table Y" is.
 - Title is imperative and specific. "Auth" is bad; "Add JWT validation to /api/v1 middleware" is good.
 - Context links *back* to the PRD ("Implements REQ-3 from PRD §6.1"). Don't re-justify the feature.
 ## Self-audit before presenting
 After drafting all issues, **before showing them to the user**, run this audit and fix anything that fails. Do not skip it — the audit is the difference between a backlog that actually ships end-to-end and one that papers over gaps.
 **Per-issue checks:**
 1. Does the `Slice — layers touched` line include every layer from the inventory, or explicitly defer the missing ones to a later, named slice?
 2. Does every layer in the `Slice check` block have a ✓ or a ⚠-with-reason? No silent omissions.
 3. Is at least one acceptance criterion observable at the *outermost* layer (the one a user or operator sees)? If all criteria are internal (DB rows, log lines), the slice isn't actually end-to-end.
 4. Does the title name a behavior, not a layer? Reject "Add InfluxDB write…"; accept "Show flow rate on dashboard…".
 5. Is the slice independently demoable — could you record a 30-second clip showing it work, without depending on a sibling issue?
 **Whole-PRD coverage check** (build a coverage matrix in your head, then render it in the preamble — see below):
 1. Every functional requirement in the PRD maps to at least one slice that *fully delivers* it (or to a clearly named later slice). No requirement is left half-covered across multiple slices that all defer the last mile.
 2. Every non-functional requirement (perf, security, observability) is anchored to a specific slice — even if it's a small thread inside a larger slice. Don't let NFRs float.
 3. Every PRD Open Question has a spike issue.
 4. Every Out-of-scope item is mentioned once in the preamble — not silently dropped.
 5. The union of all slices' `layers touched` covers the full layer inventory. If a layer never appears, either the feature doesn't need it (and the inventory was wrong — fix it) or you missed a slice.
 If any check fails, **fix the draft before presenting it**. Don't show the user a draft you know is incomplete and expect them to catch it.
 After the audit passes, include a short **Coverage matrix** at the top of the draft so the user can verify too:
 ```
 Coverage matrix:
  REQ-1 (functional)        → slice #1, #3
  REQ-2 (functional)        → slice #2
  NFR p95 < 200ms           → slice #2 (perf test)
  NFR observability         → slice #1 (metrics + dashboard)
  Open Q: which auth?       → spike #S1
  Out of scope: SSO         → not filed (per PRD §10)
 Layer inventory:  schema → service → API → UI → tests → metrics
 Layers in slices: schema(#1,#3) service(#1,#2,#3) API(#1,#2,#3) UI(#1,#3) tests(all) metrics(#1)
 ```
 If the matrix surfaces a gap mid-presentation, stop and revise — don't ask the user to accept a known-incomplete backlog.
 ## Flow
 1. Read the PRD (from chat or file).
 2. Detect the tracker; note it in one line at the top: `Tracker: gitea.wbd-rd.nl/RnD/infra (via tea CLI)` or similar.
 3. Draft the issues (do not present yet).
 4. **Run the self-audit above.** Fix anything that fails. Repeat until clean.
 5. Output the draft: tracker line, layer inventory, coverage matrix, then the numbered issues (each with its inline `Slice check` block), then a "Dependency graph:" block if there are cross-issue blockers.
 6. **Stop.** Ask: "Looks right? Reply 'create' to file them, 'edit N: <change>' to revise a specific issue, or 'skip N' to drop one."
 7. On `create`: file the issues using the detected tracker's CLI, in dependency order so blocker references resolve. After each one, print the issue number and URL. If a command fails, stop and surface the error — do not continue blindly.
 8. After creation, print a final summary: `Filed N issues: #123, #124, …`.
 ## Safety
 Filing issues is visible to teammates. Never create issues without an explicit "create" / "file them" / "go ahead" from the user — drafting and showing the list does not count as approval. If the user said something ambiguous like "ok" or "looks good", confirm once more before creating.
 If the tracker requires auth and the credential isn't present (e.g. no `GITEA_TOKEN`, `gh auth status` fails), stop and tell the user what's needed. Don't try to work around it.
 ## What not to do
 - **Don't slice horizontally.** No "issue 1: database, issue 2: API, issue 3: UI". If your draft looks like a layer cake, reslice.
 - **Don't front-load prerequisites as separate issues.** The migration, the new dependency, the config change ride inside the slice that needs them.
 - Don't file the PRD itself as an issue. The PRD is the source; issues are the work.
 - Don't create a giant "Epic: <feature>" tracking issue unless the user asked for one. Most teams already have milestones or projects for that.
 - Don't pad issues with restated PRD text. Link, don't copy.
 - Don't assign issues, set milestones, or add to projects unless the user told you which. Leave assignment empty.
 - Don't add comments like "Generated from PRD by Claude" to the issue body. The issues stand on their own.
--- a/.claude/skills/prd/SKILL.md
+++ b/.claude/skills/prd/SKILL.md
@@ -0,0 +1,59 @@
 ---
 name: prd
 description: Write a product requirements document for a feature or initiative. Designed to follow a /grill-me session — synthesizes what the grilling exposed (the real problem, the gaps, the tradeoffs the user committed to) into a sharp PRD. Also works standalone. Use when the user invokes /prd, asks for a "PRD", "product requirements", or says something like "now write this up" after a grilling.
 ---
 # PRD — Product Requirements Document
 **Mode: TOGETHER (human-in-the-loop).** The PRD encodes decisions and tradeoffs the user owns. Draft from context, but expect the user to review and edit. The skill *can* draft AFK once a grilling has already happened (that's most of the input), but the final document needs the user's eyes before it feeds `/prd-to-issues`.
 You are now a senior PM writing a PRD that engineering will actually use. The job is to lock down what's being built, why, and what success looks like — not to sell the idea or pad it with strategy slides.
 ## Continuity with grill-me
 If a `/grill-me` session preceded this in the current conversation, mine it as primary input. The grilling already exposed:
 - What the user *actually* knows vs. is hand-waving
 - Which constraints they committed to (real) vs. which they ducked (open question)
 - Edge cases that came up and how they answered
 The PRD should reflect that. Things the user nailed go in as firm requirements. Things they hedged on go in **Open Questions** with the specific gap named — don't paper over them. If a tradeoff was explicitly chosen during the grilling, write it as a decision, not a question.
 If there was no preceding grilling, ask one question first: "What's the feature, who's it for, and what's the deadline (or 'none')?" Then proceed.
 ## Structure
 Produce the PRD as a single markdown document. Use exactly these sections, in this order. Skip a section only if it would be empty — never include a section just to write "N/A".
 1. **Title & one-line summary** — the feature name and a sentence a stranger could understand.
 2. **Problem** — what's broken or missing today, who it hurts, evidence it matters. No solution talk yet.
 3. **Goals** — 2–5 bullets, each a concrete outcome (not an activity). "Reduce X by Y" beats "improve X".
 4. **Non-goals** — what this explicitly will *not* do. This section is load-bearing; do not skip it. Pull from things the user pushed back on or de-scoped during the grilling.
 5. **Users & scenarios** — who uses it, in what situation. 1–3 concrete scenarios written as "When X, the user does Y to achieve Z." No personas with names and hobbies.
 6. **Requirements**
   - **Functional** — numbered list. Each requirement is testable. "The system shall…" or "Given X, when Y, then Z." If a requirement can't be verified by reading it, rewrite it.
   - **Non-functional** — performance budgets, security/privacy, scale, accessibility, observability. Numbers where possible.
 7. **Constraints & dependencies** — what's fixed (existing systems, stack choices, deadlines, headcount) and what this depends on shipping first.
 8. **Success metrics** — how we'll know it worked, with a target and a measurement source. "Adoption" is not a metric; "≥40% of weekly active X use the feature within 8 weeks, measured via event Y" is.
 9. **Open questions** — explicit unknowns with an owner and a deadline-to-resolve where possible. This is where grilling gaps land.
 10. **Out of scope** — same energy as Non-goals, but for things that *could* be in a v2. One bullet each, no justification needed.
 ## Tone & quality bar
 - Specific over comprehensive. A 1-page PRD that engineers can build from beats a 6-page one they skim.
 - Write to engineers, not execs. Skip the market-sizing, the "why now", the strategy paragraph. The Problem section is enough motivation.
 - Every requirement must be testable. If you can't write the test, the requirement is too vague.
 - Prefer numbers over adjectives. "Fast" is meaningless; "p95 < 200ms" is a contract.
 - Call out the tradeoff the user is making, especially when they made it deliberately during grilling. Make it visible so reviewers can't accidentally undo it.
 - Don't invent. If the grilling didn't establish a number, deadline, or stakeholder, leave it as an Open Question — don't fabricate one to look complete.
 ## Output mode
 Default: write the PRD inline in the chat as markdown. If the user said "save it" or "write to file", write it to `docs/prd/<short-kebab-name>.md` (create the directory if missing). Confirm the path after writing.
 ## What not to do
 - No emojis, no excessive bold, no marketing voice. This is an engineering document.
 - No "Background" section that retells history. Problem is enough.
 - No "Phases" or "Rollout Plan" unless the user asked — that's a separate doc.
 - Don't ask clarifying questions mid-draft. If grilling didn't cover it and you can't infer it, it goes in Open Questions.
 - Don't grade or comment on the idea. Write the PRD for the feature as briefed.
--- a/.claude/skills/prototype/SKILL.md
+++ b/.claude/skills/prototype/SKILL.md
@@ -0,0 +1,65 @@
 ---
 name: prototype
 description: Build a throwaway spike to falsify or confirm a single risky assumption. Code lives in .prototypes/ (gitignored) and is never promoted to the main codebase. Reports findings as evidence that feeds /prd. Use when the user invokes /prototype, says "spike X", "throwaway test for Y", "can we actually do Z" — typically after /research surfaces an Open Unknown.
 ---
 # Prototype — throwaway spike
 **Mode: MOSTLY TOGETHER.** The build and run go AFK, but the user picks the assumption to test and decides what the findings mean. The output is *evidence*, not production code.
 You are now an engineer running a time-boxed spike to learn one thing. The point is to falsify or confirm an assumption fast — not to build a feature, not to produce code anyone will reuse.
 ## Hard rules
 1. **One assumption per prototype.** If the user gives you two, ask which matters most; the other can be a second prototype.
 2. **The assumption must be falsifiable.** "Will it be fast?" → no. "Can Node-RED sustain 1k msg/s to InfluxDB on the dev VM for 10 min?" → yes. If the user's claim isn't falsifiable, refuse and ask for a sharper one before building anything.
 3. **Throwaway means throwaway.** Code lives in `<repo-root>/.prototypes/<short-name>/` only. The directory is gitignored (add it as the first step if it isn't). Nothing in `.prototypes/` is ever committed to the main codebase. No exceptions.
 4. **Time-box.** Default budget: 30 minutes of work and ≤200 LOC. If the user gave a different budget, use that. When you blow through, stop and report whatever you've got.
 ## Steps
 1. **Restate the assumption** in falsifiable form. Show it to the user. Wait one turn for confirmation or correction — this is the only mid-skill checkpoint.
 2. **Pick the minimum viable test.** Options:
   - **Code spike** — throwaway script that exercises the question. Most common.
   - **Reading spike** — deep read of a library/spec/codebase, no code. Use when the question is "does X support Y" and the docs would tell you.
   - **Manual integration spike** — run a command, hit an endpoint, observe. Use when the question is about a real service's behavior.
 3. **Set up the dir.**
   ```bash
   ROOT=$(git rev-parse --show-toplevel)
   mkdir -p "$ROOT/.prototypes/<name>"
   grep -qxE '\.prototypes/?' "$ROOT/.gitignore" 2>/dev/null || echo '.prototypes/' >> "$ROOT/.gitignore"
   ```
 4. **Build the smallest thing that tests the assumption.** Resist polish. No tests on the prototype itself, no error handling, no docs, no abstractions. Hardcode values. Inline everything.
 5. **Run it.** Capture output. If it crashes in a way that's *about* the assumption (e.g. memory blows up at 1k msg/s), that's a finding — not a bug to fix.
 6. **Iterate up to the budget.** If a quick adjustment sharpens the test, make it. If you're tempted to refactor or expand scope, stop and report instead.
 7. **Report findings.** In chat, using this structure:
 ```
 # Prototype findings: <assumption>
 **Verdict:** confirmed | falsified | inconclusive
 **Budget used:** <e.g. 22 min, 140 LOC>
 ## What I did
 <2–3 sentences. What the spike actually exercised.>
 ## Evidence
 <concrete output, numbers, logs, observed behavior. Paste the relevant snippet.>
 ## What this changes in our mental model
 <one paragraph — what we believed before vs. what we believe now>
 ## Recommended next step
 <one sentence — usually /prd, sometimes another /prototype, sometimes "kill this idea">
 ## Prototype location (do not import)
 .prototypes/<name>/
 ```
 ## What not to do
 - **Don't promote the prototype.** Even if it works beautifully. The next phase is `/prd` → `/prd-to-issues` → `/ship-it` implementing the real thing in production code — not adapting the spike.
 - **Don't polish.** Tests, types, lint-clean, comments — none of it. The code is disposable.
 - **Don't expand scope.** "Since I'm here, I'll also test…" — no. File the second question for a separate prototype.
 - **Don't commit `.prototypes/`.** Ever. If you find yourself wanting to share the prototype, share the *findings*, not the code.
 - **Don't ask the user mid-build.** If the assumption was underspecified, you should have caught that in step 1. Once running, run.
--- a/.claude/skills/research/SKILL.md
+++ b/.claude/skills/research/SKILL.md
@@ -0,0 +1,70 @@
 ---
 name: research
 description: Gather external knowledge and codebase context for a topic before committing to a direction. Fans out Explore + WebSearch agents in parallel, synthesizes findings into a research brief, and names open unknowns explicitly. Use when the user invokes /research, says "look into X", "what's the prior art on Y", or "research how Z works" — typically before /grill-me or /prd.
 ---
 # Research — fetch knowledge into a brief
 **Mode: MOSTLY TOGETHER.** The fetching and synthesis run AFK (Agent subagents do the legwork). The brief lands in chat; you decide what's worth pursuing. No external state is changed.
 You are now a senior engineer doing a focused research pass. Goal: enough knowledge to make a good `/prd` decision later — no more. Do not write code, do not pick a winner, do not write the PRD. Lay out what's known, what's available, and what's still unknown.
 ## Inputs
 The user names a topic. If they didn't give constraints, ask exactly one question: "Any constraints I should anchor against (existing stack, deadline, must-use library)?" Then proceed.
 ## How to research
 1. **Decompose the topic into 3–5 specific questions.** Show these in chat before fetching — gives the user a chance to reroute if you mis-framed it.
 2. **Fan out in parallel** using the Agent tool. Launch concurrently in a single message:
   - **Explore agent** — codebase patterns, prior art in this repo, related modules. Question: "Does this repo already do something like X? Where? What patterns does it use?"
   - **general-purpose agent (with WebSearch)** — external docs, library options, well-known design patterns, published case studies. Question: "What are the established approaches to Y? What libraries handle Z?"
   - Optional third agent for git/PR history if the topic has a long lineage in this codebase.
 3. **Synthesize, don't dump.** When agents report back, write a brief — not a transcript.
 ## Output
 Inline by default, in this exact structure:
 ```
 # Research brief: <topic>
 ## Questions
 1. <decomposed question>
 2. ...
 ## What's already in this codebase
 - <finding> (path/to/file.ts:42)
 - ...
 ## External options
 - **<option>** — <one-line eval. when it fits, when it doesn't>
 - ...
 ## Prior art
 - <link> — <one-line takeaway>
 - ...
 ## Open unknowns
 - <thing no source can answer; candidate for /prototype>
 - ...
 ## Recommended next step
 <one sentence>
 ```
 Say "save it" → write to `docs/research/<short-kebab-name>.md`.
 ## Quality bar
 - Specific over comprehensive. A 1-page brief that surfaces the real decision beats a 5-page survey.
 - Cite sources for every claim. `file:line` for codebase, URL for external. No floating assertions.
 - Name what you don't know. If a question can't be answered from sources, that's an Open Unknown, not a gap to paper over with confident-sounding speculation.
 - Don't recommend a winner among external options. Surface tradeoffs; `/prd` picks.
 ## What not to do
 - Don't write code. Not even illustrative snippets. The output is a brief, not a sketch.
 - Don't open files yourself to skim — let the Explore agent do that. Synthesizing is your job.
 - Don't fabricate. If WebSearch returns nothing useful, say "no relevant prior art found" instead of inventing one.
 - Don't make product decisions. "Should we use X or Y?" → both, with tradeoffs, then "your call."
--- a/.claude/skills/ship-it/SKILL.md
+++ b/.claude/skills/ship-it/SKILL.md
@@ -0,0 +1,115 @@
 ---
 name: ship-it
 description: AFK autopilot. Drives a shell loop that works through every ready issue in the tracker (GitHub via gh, Gitea via tea), implementing each vertical slice end-to-end and committing per issue. Status streams to the terminal so the human can tail progress locally and Ctrl-C anytime. The shell is the loop; each iteration dispatches one fresh headless Claude run to ship one issue. Use when the user invokes /ship-it, says "go AFK on this", "work the backlog", "ralph the issues", or "ship everything".
 ---
 # Ship It — AFK backlog autopilot
 **Mode: AFK.** No human in the loop. Does not ask questions mid-run. If a slice is undecidable, the iteration labels the issue `needs-decision` and the loop moves on. The human gets one summary at the end, not chatter during.
 ## How this works (read before invoking)
 The actual loop runs in a shell script: `.claude/skills/ship-it/loop.sh`. **The shell is the loop**, not you. Each iteration shells out to a fresh, headless `claude -p` invocation that processes exactly one issue using `.claude/skills/ship-it/iterate.md` as its prompt. Three reasons this design beats "LLM keeps going inside one session":
 1. **Fresh context per issue.** No drift, no accumulated history bloating the window.
 2. **Visible in the terminal.** Progress streams to stdout and tees to a log file. The human can tail it from another shell, see commits land, and Ctrl-C cleanly.
 3. **Survives session close.** Closing the interactive Claude window doesn't kill the loop. Re-attach by tailing the log.
 ## Files
 - `loop.sh` — orchestrator. Tracker detection, preflight, dispatch loop, status output, stop conditions, summary.
 - `iterate.md` — the prompt passed to each per-issue headless Claude. Read it; it defines what "shipped" means.
 - `SKILL.md` — this file. When the user invokes `/ship-it`, you bootstrap and hand off.
 ## When the user invokes /ship-it
 You (the interactive Claude) do the bootstrap, not the work. Concretely:
 1. **Preflight in chat** (catches the obvious failures before the script runs):
   - `git status --porcelain` empty?
   - On `main` (or `$SHIP_IT_TRUNK`)? Up-to-date with origin?
   - `gh auth status` (or tea token) returns 0?
   - `gh issue list --state open --label slice | wc -l` ≥ 1?
 2. **Show the plan** in one short block: tracker host, trunk branch, count of ready issues, the first 3 issue titles, the log path. Nothing more.
 3. **Ask one question:** "Start? Reply `go`." This is the *only* human-in-the-loop checkpoint — kicking off AFK work is a real commitment, deserves an explicit ok.
 4. **On `go`:** run the loop in the foreground so the user sees live output:
   ```
   bash .claude/skills/ship-it/loop.sh
   ```
   Do not background it. Do not pipe through anything that buffers. The user can Ctrl-C.
 5. **While it runs:** stay silent. Don't interject. Don't "monitor" by re-reading logs in chat — the user has the terminal.
 6. **When it exits:** read the final `==== ship-it summary ====` block from the log file, present it once with concrete next steps ("2 issues are `needs-decision` — open them to answer their questions?").
 ## Following progress
 The script logs to stdout AND tees to `.ship-it-logs/run-<RUN_ID>.log`. Tail from another terminal:
 ```bash
 tail -f .ship-it-logs/run-*.log
 ```
 Per-issue detail (everything the headless Claude did for that one issue) is in `.ship-it-logs/iter-<RUN_ID>-<ISSUE>.log` — useful for debugging a failed iteration.
 Commits land in git as the loop runs. Watch with:
 ```bash
 watch -n 5 'git log --oneline -10 origin/main'
 ```
 ## Config (env vars, override before invoking)
 | Var | Default | Purpose |
 |---|---|---|
 | `SHIP_IT_MAX` | 50 | Hard cap on iterations per run |
 | `SHIP_IT_MAX_FAIL` | 3 | Consecutive failures before stop |
 | `SHIP_IT_TRUNK` | `main` | Trunk branch name |
 | `SHIP_IT_TIMEOUT` | `30m` | Per-issue timeout (kills the headless claude) |
 | `SHIP_IT_LOG_DIR` | `<repo>/.ship-it-logs` | Where logs go |
 ## What each iteration does (per `iterate.md`)
 For one issue: read it → branch from trunk → write failing e2e test at the outermost layer → implement layer by layer until the test passes → run the full suite → outermost-layer smoke check → commit (one commit, message ends `Closes #N`) → push → open PR with acceptance-criteria checkboxes + smoke evidence → wait for CI → merge if green and branch protection allows, else leave open for review → return to trunk → emit `ITERATION_RESULT:` line for the loop.
 **Commit per issue:** yes, exactly. One commit per slice, referenced to the issue, lands on the branch before the PR opens. The slice scope was made small in `/prd-to-issues` precisely so this is one tight commit, not a series.
 ## Stop conditions (in priority order)
 1. **User Ctrl-C** → trap catches SIGINT, current step finishes cleanly, summary prints, exit 130.
 2. **Backlog empty** (no ready issues) → exit 0.
 3. **Three consecutive hard failures** → exit 1. Something systemic — bad dependency, branch protection blocking, flaky env. Surfaces for human review.
 4. **Precondition violated mid-run** → exit non-zero with reason.
 ## What "ready" means (the loop's filter)
 An issue is `ready` iff:
 - State is open
 - Has label `slice` (filed by `/prd-to-issues`)
 - Does NOT have label `blocked`, `needs-decision`, or `ci-failed`
 - Is not a spike (spikes deliver decisions, not code — humans handle those)
 Issues are processed in number order — walking-skeleton first, as `/prd-to-issues` ordered them.
 ## Safety boundaries
 The headless Claude is launched with a tool allowlist that excludes destructive operations. It cannot:
 - Force-push or rewrite shared history
 - Bypass branch protection or skip CI hooks (`--no-verify`, `--admin`)
 - Auto-merge red or pending PRs (the iterate prompt forbids it, and CI gates back it up)
 - Modify CI/CD config or IaC unless the slice's `Slice — layers touched` line explicitly names that layer
 - Close issues without the outermost-layer smoke check passing
 - Assign people or change milestones/projects
 If something tries to push past these in practice (e.g. a slice "needs" a CI change to pass), it should fail the iteration with `needs-decision` and let a human approve the scope expansion.
 ## What not to do
 - **Don't drive the loop yourself by reading issues and implementing them inline.** The shell is the loop. If you're tempted to "just do this one in chat," stop and run the script.
 - **Don't background the script** so the user can keep chatting with you. The output IS the value. The user wants to watch it work.
 - **Don't summarize between iterations.** Chatter belongs in the final summary, not after each commit.
 - **Don't tag the user in PR/issue comments** during the run. They're not in the loop until the script exits.
 - **Don't restart a failed iteration manually.** The loop's `needs-decision` and `ci-failed` labels are how failures stay in the tracker for human triage. Manual restart skips that.
 ## How this fits the chain
 `/grill-me <feature>` (together) → `/prd` (together) → `/prd-to-issues` (mostly together, file step needs `create`) → `/ship-it` (AFK). The four-skill arc takes a vague feature idea to merged code with one human checkpoint per phase boundary.
--- a/.claude/skills/ship-it/iterate.md
+++ b/.claude/skills/ship-it/iterate.md
@@ -0,0 +1,70 @@
 # ship-it iterate — one issue, end-to-end
 You are running ONE iteration of the ship-it AFK loop. Implement, verify, and ship exactly one issue, then exit. The outer shell loop will pick the next one.
 **Mode: AFK.** Do not ask questions. If the issue is genuinely undecidable from its body + linked PRD + grilling notes already in the issue or repo, drop a comment on the issue with the specific question, label it `needs-decision`, and exit with status=needs-decision. Do not guess at user intent.
 Variables provided below this prompt: `ISSUE_NUMBER`, `TRACKER_CLI` (`gh` or `tea`), `TRUNK_BRANCH`, `REPO_ROOT`.
 ## Steps
 1. **Read the issue.**
   - GitHub: `gh issue view $ISSUE_NUMBER --json number,title,body,labels`
   - Gitea: `tea issues $ISSUE_NUMBER --output json`
   - Parse: `Slice — layers touched`, `Scope`, `Acceptance criteria`, `Slice check`, `Notes`, linked PRD path.
   - If `Acceptance criteria` is missing or non-testable → exit status=needs-decision with reason "acceptance criteria not testable".
 2. **Branch from latest trunk.**
   `git fetch origin && git switch -c "slice/${ISSUE_NUMBER}-<short-kebab-slug>" "origin/$TRUNK_BRANCH"`
 3. **Write the failing e2e test first.** Anchored at the OUTERMOST layer named in `Slice — layers touched` (HTTP endpoint, UI smoke, dashboard query, log assertion — whatever the acceptance criterion observes). Run it. Confirm it fails for the right reason. If you can't write an e2e test for this slice, that's a sign the acceptance criterion isn't really observable end-to-end → exit status=needs-decision.
 4. **Implement layer by layer.** Walk the `Slice — layers touched` list. Make the minimal change at each layer to satisfy the slice — do not gold-plate, do not refactor adjacent code, do not "improve" things outside scope. Re-run the e2e test after each layer change.
 5. **Run the broader test suite.** Catch regressions caused by the slice. Fix any test that was green before and is now red — do not skip or mark tests. If a test was already red before your changes, leave it (note in PR body).
 6. **Outermost-layer smoke check.** The 30-second-demo check: hit the endpoint with curl, query the dashboard, tail the log, load the page. Observe what the acceptance criterion observes. Capture the output (curl response body, log snippet, query result) — you'll paste it into the PR body as evidence.
 7. **Commit.** One commit per slice (or a tight series — no WIP commits, no fixup commits, no "address review" before review exists). Read the repo's recent `git log` to match commit style. Message ends with `Closes #${ISSUE_NUMBER}`.
 8. **Push and open PR.**
   - GitHub: `git push -u origin HEAD && gh pr create --fill`
   - Gitea: `git push -u origin HEAD && tea pr create --title "..." --description "..."`
   - PR body must include:
     - Each acceptance criterion as a checked `- [x]` line.
     - The smoke-check evidence (curl output / log snippet / screenshot path) in a fenced block.
     - `Closes #${ISSUE_NUMBER}` (so the issue auto-closes on merge).
 9. **Wait for CI and decide merge.**
   - Poll: `gh pr checks --watch` (or `tea pr status`).
   - **All green + branch protection allows direct merge** → `gh pr merge --squash --delete-branch`. Verify the merge commit landed on trunk.
   - **All green + branch protection requires human review** → leave PR open. Comment `Ready for review — all acceptance criteria verified, smoke check passed.` on the issue. Exit status=shipped with the PR number.
   - **Red CI** → one fix-and-push cycle. Read the failing log, fix the actual cause (do not skip the test). If still red after the second attempt: label issue `ci-failed`, comment with the CI excerpt, leave PR open, exit status=failed with reason "ci-red".
 10. **Return to trunk.** `git switch $TRUNK_BRANCH && git pull --ff-only`. If the slice was merged, run the smoke check one more time against integrated trunk. If it fails there → revert the merge, label `regression`, exit status=failed with reason "regression-on-trunk".
 ## Boundaries
 - Never force-push, never rewrite shared history, never delete branches you didn't create.
 - Never bypass branch protection (`--admin`) or skip CI hooks (`--no-verify`).
 - Never auto-merge a PR whose CI is red or pending.
 - Never close an issue without the outermost-layer smoke check passing.
 - Never modify CI/CD config, IaC, or production data unless the slice's `layers touched` explicitly names that layer.
 - Never invent acceptance criteria. If they're vague, label `needs-decision`.
 - Never assign issues or change milestones.
 ## Final output line
 The shell loop greps for this exact line to determine outcome. Print it as the LAST line before exiting, on its own line, no decoration:
 ```
 ITERATION_RESULT: status=<shipped|failed|needs-decision> issue=#<N> pr=<#N|none> reason=<short single-line reason>
 ```
 Examples:
 ```
 ITERATION_RESULT: status=shipped issue=#142 pr=#287 reason=merged-to-main
 ITERATION_RESULT: status=shipped issue=#143 pr=#288 reason=open-for-review
 ITERATION_RESULT: status=failed issue=#144 pr=#289 reason=ci-red-after-retry
 ITERATION_RESULT: status=needs-decision issue=#145 pr=none reason=acceptance-criteria-not-testable
 ```
--- a/.claude/skills/ship-it/loop.sh
+++ b/.claude/skills/ship-it/loop.sh
@@ -0,0 +1,189 @@
 #!/usr/bin/env bash
 # ship-it AFK loop — works through every ready issue end-to-end.
 # See SKILL.md for design. Ctrl-C to stop; partial work is preserved on disk.
 set -uo pipefail
 SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
 REPO_ROOT="$(git rev-parse --show-toplevel 2>/dev/null)" || { echo "not in a git repo"; exit 1; }
 # ---- config (env-overridable) ----
 MAX_ITERATIONS="${SHIP_IT_MAX:-50}"
 MAX_CONSECUTIVE_FAILURES="${SHIP_IT_MAX_FAIL:-3}"
 TRUNK_BRANCH="${SHIP_IT_TRUNK:-main}"
 ITERATION_TIMEOUT="${SHIP_IT_TIMEOUT:-30m}"   # per-issue cap
 LOG_DIR="${SHIP_IT_LOG_DIR:-$REPO_ROOT/.ship-it-logs}"
 mkdir -p "$LOG_DIR"
 RUN_ID="$(date -u +%Y%m%dT%H%M%SZ)"
 LOG_FILE="$LOG_DIR/run-$RUN_ID.log"
 # ---- logging ----
 log() {
  local ts; ts="$(date -u +%H:%M:%S)"
  printf '[%s] %s\n' "$ts" "$*" | tee -a "$LOG_FILE"
 }
 die() { log "FATAL: $*"; exit 1; }
 # ---- graceful interrupt ----
 INTERRUPTED=0
 on_interrupt() {
  INTERRUPTED=1
  log ""
  log "interrupt received — finishing current step cleanly, then stopping"
 }
 trap on_interrupt INT
 # ---- tracker detection ----
 ORIGIN_URL="$(git -C "$REPO_ROOT" remote get-url origin 2>/dev/null || true)"
 if [[ "$ORIGIN_URL" == *"github.com"* ]]; then
  TRACKER_CLI="gh"
  command -v gh >/dev/null || die "gh CLI not installed"
  gh auth status >/dev/null 2>&1 || die "gh not authenticated (run: gh auth login)"
  list_ready_issues() {
    gh issue list --state open --label slice --limit 100 \
      --json number,title,labels \
      --jq '[.[] | select(.labels | map(.name) | (contains(["blocked"]) or contains(["needs-decision"]) or contains(["ci-failed"])) | not)] | sort_by(.number)'
  }
 elif [[ "$ORIGIN_URL" == *"gitea"* ]]; then
  TRACKER_CLI="tea"
  command -v tea >/dev/null || die "tea CLI not installed (Gitea repo detected) — install tea or switch to a GitHub remote"
  list_ready_issues() {
    tea issues list --state open --output json 2>/dev/null \
      | jq '[.[] | select((.labels // []) | map(.name) | (contains(["blocked"]) or contains(["needs-decision"]) or contains(["ci-failed"])) | not) | select((.labels // []) | map(.name) | contains(["slice"]))] | sort_by(.index)'
  }
 else
  die "unknown tracker for origin: '$ORIGIN_URL' (need github.com or gitea.*)"
 fi
 # ---- preflight ----
 cd "$REPO_ROOT"
 [[ -z "$(git status --porcelain)" ]] || die "git tree is dirty — commit or stash before starting"
 CURRENT_BRANCH="$(git branch --show-current)"
 [[ "$CURRENT_BRANCH" == "$TRUNK_BRANCH" ]] || die "not on $TRUNK_BRANCH (on '$CURRENT_BRANCH')"
 git fetch origin "$TRUNK_BRANCH" >/dev/null 2>&1 || die "git fetch failed"
 LOCAL_SHA="$(git rev-parse HEAD)"
 REMOTE_SHA="$(git rev-parse "origin/$TRUNK_BRANCH")"
 [[ "$LOCAL_SHA" == "$REMOTE_SHA" ]] || die "$TRUNK_BRANCH not up-to-date with origin (pull first)"
 command -v claude >/dev/null || die "claude CLI not on PATH"
 # ---- banner ----
 log "ship-it run $RUN_ID"
 log "  tracker:  $TRACKER_CLI ($ORIGIN_URL)"
 log "  trunk:    $TRUNK_BRANCH @ ${LOCAL_SHA:0:8}"
 log "  log:      $LOG_FILE"
 log "  config:   max_iter=$MAX_ITERATIONS, max_fail=$MAX_CONSECUTIVE_FAILURES, timeout=$ITERATION_TIMEOUT"
 log ""
 ITERATE_PROMPT_TEMPLATE="$(cat "$SCRIPT_DIR/iterate.md")"
 SHIPPED=()
 FAILED=()
 NEEDS_DECISION=()
 CONSECUTIVE_FAILURES=0
 ITERATION=0
 # ---- main loop ----
 while (( ITERATION < MAX_ITERATIONS )); do
  (( INTERRUPTED )) && break
  ITERATION=$((ITERATION + 1))
  READY_JSON="$(list_ready_issues 2>/dev/null || echo '[]')"
  READY_COUNT="$(echo "$READY_JSON" | jq 'length' 2>/dev/null || echo 0)"
  if (( READY_COUNT == 0 )); then
    log "backlog empty — stopping"
    break
  fi
  ISSUE_NUM="$(echo "$READY_JSON" | jq -r '.[0].number // .[0].index')"
  ISSUE_TITLE="$(echo "$READY_JSON" | jq -r '.[0].title')"
  log "─────────────────────────────────────────────────────────────"
  log "iter $ITERATION | #$ISSUE_NUM \"$ISSUE_TITLE\" ($READY_COUNT ready) → starting"
  ITER_LOG="$LOG_DIR/iter-$RUN_ID-$ISSUE_NUM.log"
  PROMPT="$ITERATE_PROMPT_TEMPLATE
 ## Variables for this iteration
 - ISSUE_NUMBER=$ISSUE_NUM
 - TRACKER_CLI=$TRACKER_CLI
 - TRUNK_BRANCH=$TRUNK_BRANCH
 - REPO_ROOT=$REPO_ROOT
 Begin."
  ITER_START="$(date +%s)"
  set +e
  timeout "$ITERATION_TIMEOUT" claude -p "$PROMPT" \
    --allowed-tools "Bash,Edit,Write,Read,Grep,Glob,WebFetch" \
    --output-format text \
    >"$ITER_LOG" 2>&1
  CLAUDE_EXIT=$?
  set -e
  ITER_END="$(date +%s)"
  ITER_DURATION=$((ITER_END - ITER_START))
  RESULT_LINE="$(grep -E '^ITERATION_RESULT:' "$ITER_LOG" | tail -1 || true)"
  STATUS="$(echo "$RESULT_LINE" | sed -n 's/.*status=\([^ ]*\).*/\1/p')"
  PR_FIELD="$(echo "$RESULT_LINE" | sed -n 's/.*pr=\([^ ]*\).*/\1/p')"
  REASON="$(echo "$RESULT_LINE" | sed -n 's/.*reason=\(.*\)/\1/p')"
  if (( CLAUDE_EXIT == 124 )); then
    STATUS="failed"
    REASON="timeout after $ITERATION_TIMEOUT"
  fi
  case "$STATUS" in
    shipped)
      log "iter $ITERATION | #$ISSUE_NUM ✓ shipped → PR $PR_FIELD (${ITER_DURATION}s)"
      SHIPPED+=("#$ISSUE_NUM→$PR_FIELD")
      CONSECUTIVE_FAILURES=0
      ;;
    failed)
      log "iter $ITERATION | #$ISSUE_NUM ✗ failed: $REASON (${ITER_DURATION}s, see $ITER_LOG)"
      FAILED+=("#$ISSUE_NUM ($REASON)")
      CONSECUTIVE_FAILURES=$((CONSECUTIVE_FAILURES + 1))
      ;;
    needs-decision)
      log "iter $ITERATION | #$ISSUE_NUM ? needs-decision: $REASON (${ITER_DURATION}s)"
      NEEDS_DECISION+=("#$ISSUE_NUM ($REASON)")
      CONSECUTIVE_FAILURES=0
      ;;
    *)
      log "iter $ITERATION | #$ISSUE_NUM ! unknown outcome (claude exit=$CLAUDE_EXIT, ${ITER_DURATION}s) — see $ITER_LOG"
      FAILED+=("#$ISSUE_NUM (unknown outcome)")
      CONSECUTIVE_FAILURES=$((CONSECUTIVE_FAILURES + 1))
      ;;
  esac
  if (( CONSECUTIVE_FAILURES >= MAX_CONSECUTIVE_FAILURES )); then
    log "$MAX_CONSECUTIVE_FAILURES consecutive failures — stopping for human review"
    break
  fi
  # back to trunk for next iteration
  if [[ "$(git branch --show-current)" != "$TRUNK_BRANCH" ]]; then
    git switch "$TRUNK_BRANCH" >/dev/null 2>&1 || log "  warn: could not return to $TRUNK_BRANCH"
  fi
  git pull --ff-only origin "$TRUNK_BRANCH" >/dev/null 2>&1 || log "  warn: could not fast-forward $TRUNK_BRANCH"
 done
 # ---- summary ----
 log ""
 log "==== ship-it summary ===="
 log "iterations:     $ITERATION"
 log "shipped:        ${#SHIPPED[@]} ${SHIPPED[*]:-}"
 log "failed:         ${#FAILED[@]} ${FAILED[*]:-}"
 log "needs-decision: ${#NEEDS_DECISION[@]} ${NEEDS_DECISION[*]:-}"
 log "log:            $LOG_FILE"
 if (( INTERRUPTED )); then
  log "stop reason:    user-interrupt"
  exit 130
 elif (( CONSECUTIVE_FAILURES >= MAX_CONSECUTIVE_FAILURES )); then
  log "stop reason:    consecutive-failures"
  exit 1
 else
  log "stop reason:    backlog-empty"
  exit 0
 fi
--- a/.gitignore
+++ b/.gitignore
@@ -20,3 +20,4 @@ tools/.env
 .repo-mem/
 .codex
 CLAUDE.local.md
 .prototypes/
--- a/docker/grafana/provisioning/dashboards/coresync-frost-demo.json
+++ b/docker/grafana/provisioning/dashboards/coresync-frost-demo.json
@@ -0,0 +1,546 @@
 {
  "annotations": {
    "list": [
      {
        "builtIn": 1,
        "datasource": { "type": "grafana", "uid": "-- Grafana --" },
        "enable": true,
        "hide": true,
        "iconColor": "rgba(0, 211, 255, 1)",
        "name": "Annotations & Alerts",
        "target": { "limit": 100, "matchAny": false, "tags": [], "type": "dashboard" },
        "type": "dashboard"
      }
    ]
  },
  "editable": true,
  "fiscalYearStartMonth": 0,
  "graphTooltip": 1,
  "id": null,
  "links": [],
  "liveNow": false,
  "panels": [
    {
      "id": 50,
      "type": "text",
      "title": "How to read this dashboard",
      "gridPos": { "h": 5, "w": 24, "x": 0, "y": 0 },
      "options": {
        "mode": "markdown",
        "content": "**Each metric below is mentally verifiable. Hover any panel title for its definition.**\n\n| Term | Definition | Where it comes from |\n|---|---|---|\n| **raw** | every numeric sample EVOLV nodes wrote to InfluxDB before CoreSync | `_measurement = FROST Flow Sensor FT-101` (field `mAbs`) and `_measurement = rotatingmachine_cse_rm_pump` (5 named fields) |\n| **knots** | the CoreSync-reduced samples actually kept | `_measurement = coresync_knots`, `_field = knot` |\n| **reductionPct** | `100 × (1 − knots/raw)` — % of writes CoreSync skipped (higher is better) | computed in-query |\n| **kept fraction** | `knots / raw` (inverse; lower is better) | computed in-query |\n| **reason** | why CoreSync emitted a knot: `first` (1st sample), `angle-change` (slope direction shifted), `max-gap` (silent too long), `flush` (periodic) | tag on `coresync_knots` |\n\n**Sanity checks:** open the Per-stream table — `raw × (1 − reductionPct/100) = knots` should hold to the integer. The headline scoreboard sums all rows. The Knot interarrival panel should never go below ~2 s for streams updating at 1 Hz (if it does, CoreSync is over-emitting → burst-window bug)."
      }
    },
    {
      "id": 100,
      "type": "row",
      "title": "Scoreboard — raw vs knots over the selected time range",
      "gridPos": { "h": 1, "w": 24, "x": 0, "y": 5 },
      "collapsed": false,
      "panels": []
    },
    {
      "id": 1,
      "type": "stat",
      "title": "Raw samples written",
      "datasource": { "type": "influxdb", "uid": "cdzg44tv250jkd" },
      "description": "Total raw sample writes from EVOLV nodes into InfluxDB across all known CoreSync-tracked streams (FT-101 flow, P-101 pressures, efficiency, cog, SEC). This is what InfluxDB would store WITHOUT CoreSync compression.",
      "gridPos": { "h": 4, "w": 6, "x": 0, "y": 6 },
      "fieldConfig": {
        "defaults": {
          "color": { "mode": "fixed", "fixedColor": "#1f6feb" },
          "unit": "short",
          "decimals": 0,
          "mappings": []
        },
        "overrides": []
      },
      "options": {
        "colorMode": "value",
        "graphMode": "area",
        "justifyMode": "auto",
        "orientation": "auto",
        "reduceOptions": { "values": false, "calcs": ["lastNotNull"], "fields": "" },
        "textMode": "auto"
      },
      "targets": [
        {
          "refId": "A",
          "datasource": { "type": "influxdb", "uid": "cdzg44tv250jkd" },
          "query": "raw_ft101 = from(bucket:\"telemetry\") |> range(start: v.timeRangeStart, stop: v.timeRangeStop) |> filter(fn:(r)=> r._measurement == \"FROST Flow Sensor FT-101\" and r._field == \"mAbs\") |> count() |> keep(columns:[\"_value\"])\nraw_rm = from(bucket:\"telemetry\") |> range(start: v.timeRangeStart, stop: v.timeRangeStop) |> filter(fn:(r)=> r._measurement == \"rotatingmachine_cse_rm_pump\") |> filter(fn:(r)=> r._field == \"pressure.measured.downstream.dashboard-sim-downstream\" or r._field == \"pressure.measured.upstream.dashboard-sim-upstream\" or r._field == \"efficiency.predicted.atequipment.cse_rm_pump\" or r._field == \"cog\" or r._field == \"specificEnergyConsumption.predicted.atequipment.cse_rm_pump\") |> group(columns:[\"_field\"]) |> count() |> group() |> keep(columns:[\"_value\"])\nunion(tables:[raw_ft101, raw_rm]) |> sum()"
        }
      ]
    },
    {
      "id": 2,
      "type": "stat",
      "title": "CoreSync knots kept",
      "datasource": { "type": "influxdb", "uid": "cdzg44tv250jkd" },
      "description": "Total CoreSync knots actually written to InfluxDB. Each knot represents a 'meaningful' sample chosen by the angle-change reducer plus periodic flushes.",
      "gridPos": { "h": 4, "w": 6, "x": 6, "y": 6 },
      "fieldConfig": {
        "defaults": {
          "color": { "mode": "fixed", "fixedColor": "#2f9e44" },
          "unit": "short",
          "decimals": 0,
          "mappings": []
        },
        "overrides": []
      },
      "options": {
        "colorMode": "value",
        "graphMode": "area",
        "justifyMode": "auto",
        "orientation": "auto",
        "reduceOptions": { "values": false, "calcs": ["lastNotNull"], "fields": "" },
        "textMode": "auto"
      },
      "targets": [
        {
          "refId": "A",
          "datasource": { "type": "influxdb", "uid": "cdzg44tv250jkd" },
          "query": "from(bucket:\"telemetry\") |> range(start: v.timeRangeStart, stop: v.timeRangeStop) |> filter(fn:(r)=> r._measurement == \"coresync_knots\" and r._field == \"knot\") |> group() |> count()"
        }
      ]
    },
    {
      "id": 3,
      "type": "gauge",
      "title": "Reduction % (1 − knots / raw)",
      "datasource": { "type": "influxdb", "uid": "cdzg44tv250jkd" },
      "description": "Headline compression number. 100% = perfect compression (impossible). 0% = CoreSync is keeping every sample (broken). Sweet spot for the FROST demo: 60–95% depending on stream.",
      "gridPos": { "h": 4, "w": 6, "x": 12, "y": 6 },
      "fieldConfig": {
        "defaults": {
          "color": { "mode": "thresholds" },
          "min": 0,
          "max": 100,
          "unit": "percent",
          "decimals": 1,
          "thresholds": {
            "mode": "absolute",
            "steps": [
              { "color": "#d64545", "value": null },
              { "color": "#e8a23a", "value": 40 },
              { "color": "#2f9e44", "value": 70 }
            ]
          }
        },
        "overrides": []
      },
      "options": {
        "orientation": "auto",
        "reduceOptions": { "values": false, "calcs": ["lastNotNull"], "fields": "" },
        "showThresholdLabels": false,
        "showThresholdMarkers": true
      },
      "targets": [
        {
          "refId": "A",
          "datasource": { "type": "influxdb", "uid": "cdzg44tv250jkd" },
          "query": "import \"array\"\nraw_ft101 = from(bucket:\"telemetry\") |> range(start: v.timeRangeStart, stop: v.timeRangeStop) |> filter(fn:(r)=> r._measurement == \"FROST Flow Sensor FT-101\" and r._field == \"mAbs\") |> count() |> keep(columns:[\"_value\"])\nraw_rm = from(bucket:\"telemetry\") |> range(start: v.timeRangeStart, stop: v.timeRangeStop) |> filter(fn:(r)=> r._measurement == \"rotatingmachine_cse_rm_pump\") |> filter(fn:(r)=> r._field == \"pressure.measured.downstream.dashboard-sim-downstream\" or r._field == \"pressure.measured.upstream.dashboard-sim-upstream\" or r._field == \"efficiency.predicted.atequipment.cse_rm_pump\" or r._field == \"cog\" or r._field == \"specificEnergyConsumption.predicted.atequipment.cse_rm_pump\") |> group(columns:[\"_field\"]) |> count() |> group() |> keep(columns:[\"_value\"])\nraw_total = union(tables:[raw_ft101, raw_rm]) |> sum() |> findRecord(fn:(key)=> true, idx:0)\nknot_total = from(bucket:\"telemetry\") |> range(start: v.timeRangeStart, stop: v.timeRangeStop) |> filter(fn:(r)=> r._measurement == \"coresync_knots\" and r._field == \"knot\") |> group() |> count() |> findRecord(fn:(key)=> true, idx:0)\nrawN = if exists raw_total._value then float(v: raw_total._value) else 0.0\nknotN = if exists knot_total._value then float(v: knot_total._value) else 0.0\narray.from(rows: [{_value: (if rawN > 0.0 then 100.0 * (1.0 - knotN / rawN) else 0.0)}])"
        }
      ]
    },
    {
      "id": 4,
      "type": "stat",
      "title": "Approx. bytes saved",
      "datasource": { "type": "influxdb", "uid": "cdzg44tv250jkd" },
      "description": "Rough estimate: (raw − knots) × 80 bytes per line-protocol record. Order-of-magnitude only; actual savings depend on tag cardinality and retention policy.",
      "gridPos": { "h": 4, "w": 6, "x": 18, "y": 6 },
      "fieldConfig": {
        "defaults": {
          "color": { "mode": "fixed", "fixedColor": "#a347e1" },
          "unit": "decbytes",
          "decimals": 0,
          "mappings": []
        },
        "overrides": []
      },
      "options": {
        "colorMode": "value",
        "graphMode": "area",
        "justifyMode": "auto",
        "orientation": "auto",
        "reduceOptions": { "values": false, "calcs": ["lastNotNull"], "fields": "" },
        "textMode": "auto"
      },
      "targets": [
        {
          "refId": "A",
          "datasource": { "type": "influxdb", "uid": "cdzg44tv250jkd" },
          "query": "import \"array\"\nraw_ft101 = from(bucket:\"telemetry\") |> range(start: v.timeRangeStart, stop: v.timeRangeStop) |> filter(fn:(r)=> r._measurement == \"FROST Flow Sensor FT-101\" and r._field == \"mAbs\") |> count() |> keep(columns:[\"_value\"])\nraw_rm = from(bucket:\"telemetry\") |> range(start: v.timeRangeStart, stop: v.timeRangeStop) |> filter(fn:(r)=> r._measurement == \"rotatingmachine_cse_rm_pump\") |> filter(fn:(r)=> r._field == \"pressure.measured.downstream.dashboard-sim-downstream\" or r._field == \"pressure.measured.upstream.dashboard-sim-upstream\" or r._field == \"efficiency.predicted.atequipment.cse_rm_pump\" or r._field == \"cog\" or r._field == \"specificEnergyConsumption.predicted.atequipment.cse_rm_pump\") |> group(columns:[\"_field\"]) |> count() |> group() |> keep(columns:[\"_value\"])\nraw_total = union(tables:[raw_ft101, raw_rm]) |> sum() |> findRecord(fn:(key)=> true, idx:0)\nknot_total = from(bucket:\"telemetry\") |> range(start: v.timeRangeStart, stop: v.timeRangeStop) |> filter(fn:(r)=> r._measurement == \"coresync_knots\" and r._field == \"knot\") |> group() |> count() |> findRecord(fn:(key)=> true, idx:0)\nrawN = if exists raw_total._value then raw_total._value else 0\nknotN = if exists knot_total._value then knot_total._value else 0\narray.from(rows: [{_value: (rawN - knotN) * 80}])"
        }
      ]
    },
    {
      "id": 200,
      "type": "row",
      "title": "Per-stream verification table — every line is mentally checkable",
      "gridPos": { "h": 1, "w": 24, "x": 0, "y": 10 },
      "collapsed": false,
      "panels": []
    },
    {
      "id": 5,
      "type": "table",
      "title": "Per-stream raw vs knots vs reduction %",
      "datasource": { "type": "influxdb", "uid": "cdzg44tv250jkd" },
      "description": "One row per CoreSync stream. raw = raw samples written to InfluxDB. knots = CoreSync-kept samples. reductionPct = 100 × (1 − knots/raw). Streams with reductionPct < 50 are flagged red. Each cell is line-of-sight to a known Flux query — see the dashboard's 'How to read' panel at top.",
      "gridPos": { "h": 8, "w": 24, "x": 0, "y": 11 },
      "fieldConfig": {
        "defaults": {
          "custom": { "align": "auto", "cellOptions": { "type": "auto" }, "inspect": false },
          "color": { "mode": "thresholds" }
        },
        "overrides": [
          {
            "matcher": { "id": "byName", "options": "reductionPct" },
            "properties": [
              { "id": "unit", "value": "percent" },
              { "id": "decimals", "value": 1 },
              { "id": "custom.cellOptions", "value": { "type": "color-background", "mode": "gradient" } },
              {
                "id": "thresholds",
                "value": {
                  "mode": "absolute",
                  "steps": [
                    { "color": "#d64545", "value": null },
                    { "color": "#e8a23a", "value": 40 },
                    { "color": "#2f9e44", "value": 70 }
                  ]
                }
              }
            ]
          },
          {
            "matcher": { "id": "byName", "options": "raw" },
            "properties": [{ "id": "unit", "value": "short" }, { "id": "decimals", "value": 0 }]
          },
          {
            "matcher": { "id": "byName", "options": "knots" },
            "properties": [{ "id": "unit", "value": "short" }, { "id": "decimals", "value": 0 }]
          }
        ]
      },
      "options": {
        "cellHeight": "sm",
        "footer": { "countRows": false, "fields": "", "reducer": ["sum"], "show": false },
        "showHeader": true,
        "sortBy": [{ "desc": false, "displayName": "reductionPct" }]
      },
      "targets": [
        {
          "refId": "A",
          "datasource": { "type": "influxdb", "uid": "cdzg44tv250jkd" },
          "query": "import \"join\"\n\nraw_ft101 = from(bucket:\"telemetry\") |> range(start: v.timeRangeStart, stop: v.timeRangeStop) |> filter(fn:(r)=> r._measurement == \"FROST Flow Sensor FT-101\" and r._field == \"mAbs\") |> count() |> keep(columns:[\"_value\"]) |> map(fn:(r)=>({ streamKey:\"P-101:flow:measured:upstream:FT-101\", raw:r._value }))\nraw_rm_pdn = from(bucket:\"telemetry\") |> range(start: v.timeRangeStart, stop: v.timeRangeStop) |> filter(fn:(r)=> r._measurement == \"rotatingmachine_cse_rm_pump\" and r._field == \"pressure.measured.downstream.dashboard-sim-downstream\") |> count() |> keep(columns:[\"_value\"]) |> map(fn:(r)=>({ streamKey:\"p-101:pressure:measured:downstream:dashboard-sim-downstream\", raw:r._value }))\nraw_rm_pup = from(bucket:\"telemetry\") |> range(start: v.timeRangeStart, stop: v.timeRangeStop) |> filter(fn:(r)=> r._measurement == \"rotatingmachine_cse_rm_pump\" and r._field == \"pressure.measured.upstream.dashboard-sim-upstream\") |> count() |> keep(columns:[\"_value\"]) |> map(fn:(r)=>({ streamKey:\"p-101:pressure:measured:upstream:dashboard-sim-upstream\", raw:r._value }))\nraw_rm_eff = from(bucket:\"telemetry\") |> range(start: v.timeRangeStart, stop: v.timeRangeStop) |> filter(fn:(r)=> r._measurement == \"rotatingmachine_cse_rm_pump\" and r._field == \"efficiency.predicted.atequipment.cse_rm_pump\") |> count() |> keep(columns:[\"_value\"]) |> map(fn:(r)=>({ streamKey:\"p-101:efficiency:predicted:atequipment:cse_rm_pump\", raw:r._value }))\nraw_rm_cog = from(bucket:\"telemetry\") |> range(start: v.timeRangeStart, stop: v.timeRangeStop) |> filter(fn:(r)=> r._measurement == \"rotatingmachine_cse_rm_pump\" and r._field == \"cog\") |> count() |> keep(columns:[\"_value\"]) |> map(fn:(r)=>({ streamKey:\"p-101:cog:measured:atEquipment:MEASURED-p-101\", raw:r._value }))\nraw_rm_sec = from(bucket:\"telemetry\") |> range(start: v.timeRangeStart, stop: v.timeRangeStop) |> filter(fn:(r)=> r._measurement == \"rotatingmachine_cse_rm_pump\" and r._field == \"specificEnergyConsumption.predicted.atequipment.cse_rm_pump\") |> count() |> keep(columns:[\"_value\"]) |> map(fn:(r)=>({ streamKey:\"p-101:specificenergyconsumption:predicted:atequipment:cse_rm_pump\", raw:r._value }))\n\nraw = union(tables:[raw_ft101, raw_rm_pdn, raw_rm_pup, raw_rm_eff, raw_rm_cog, raw_rm_sec]) |> group(columns:[\"streamKey\"])\n\nknots = from(bucket:\"telemetry\") |> range(start: v.timeRangeStart, stop: v.timeRangeStop) |> filter(fn:(r)=> r._measurement==\"coresync_knots\" and r._field==\"knot\") |> keep(columns:[\"streamKey\",\"_value\"]) |> group(columns:[\"streamKey\"]) |> count(column:\"_value\") |> rename(columns:{_value:\"knots\"})\n\njoin.left(left: raw, right: knots, on: (l, r) => l.streamKey == r.streamKey, as: (l, r) => ({ streamKey: l.streamKey, raw: l.raw, knots: if exists r.knots then r.knots else 0 }))\n  |> map(fn:(r)=> ({ r with reductionPct: if r.raw > 0 then 100.0 * (1.0 - float(v:r.knots) / float(v:r.raw)) else 0.0 }))\n  |> group()\n  |> sort(columns:[\"reductionPct\"])"
        }
      ],
      "transformations": [
        {
          "id": "organize",
          "options": {
            "excludeByName": {},
            "indexByName": { "streamKey": 0, "raw": 1, "knots": 2, "reductionPct": 3 },
            "renameByName": {}
          }
        }
      ]
    },
    {
      "id": 300,
      "type": "row",
      "title": "Signal reconstruction — do the knots faithfully represent the raw signal?",
      "gridPos": { "h": 1, "w": 24, "x": 0, "y": 19 },
      "collapsed": false,
      "panels": []
    },
    {
      "id": 6,
      "type": "timeseries",
      "title": "Flow FT-101 — raw 1 Hz vs CoreSync knots (m³/h)",
      "datasource": { "type": "influxdb", "uid": "cdzg44tv250jkd" },
      "description": "FT-101 raw flow values vs the CoreSync knots written for the same stream. If knots reconstruct the signal, big dots sit exactly on the raw line at every direction change. Same Y-axis so they should overlap.",
      "gridPos": { "h": 10, "w": 12, "x": 0, "y": 20 },
      "fieldConfig": {
        "defaults": {
          "color": { "mode": "palette-classic" },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "m³/h",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 5,
            "gradientMode": "none",
            "hideFrom": { "legend": false, "tooltip": false, "viz": false },
            "lineInterpolation": "linear",
            "lineWidth": 1,
            "pointSize": 3,
            "scaleDistribution": { "type": "linear" },
            "showPoints": "auto",
            "spanNulls": true,
            "stacking": { "group": "A", "mode": "none" },
            "thresholdsStyle": { "mode": "off" }
          },
          "mappings": [],
          "thresholds": { "mode": "absolute", "steps": [{ "color": "green", "value": null }] },
          "unit": "flowm3h"
        },
        "overrides": [
          {
            "matcher": { "id": "byName", "options": "knot (m³/h)" },
            "properties": [
              { "id": "custom.drawStyle", "value": "points" },
              { "id": "custom.pointSize", "value": 10 },
              { "id": "custom.showPoints", "value": "always" },
              { "id": "color", "value": { "mode": "fixed", "fixedColor": "#d64545" } }
            ]
          },
          {
            "matcher": { "id": "byName", "options": "raw (m³/h)" },
            "properties": [
              { "id": "custom.lineWidth", "value": 2 },
              { "id": "color", "value": { "mode": "fixed", "fixedColor": "#1f6feb" } }
            ]
          }
        ]
      },
      "options": {
        "legend": { "calcs": ["lastNotNull", "count", "min", "max"], "displayMode": "table", "placement": "bottom", "showLegend": true },
        "tooltip": { "mode": "multi", "sort": "none" }
      },
      "targets": [
        {
          "refId": "A",
          "datasource": { "type": "influxdb", "uid": "cdzg44tv250jkd" },
          "query": "raw = from(bucket:\"telemetry\") |> range(start: v.timeRangeStart, stop: v.timeRangeStop) |> filter(fn:(r)=> r._measurement == \"FROST Flow Sensor FT-101\" and r._field == \"mAbs\") |> map(fn:(r)=>({ r with _field: \"raw (m³/h)\" }))\nknots = from(bucket:\"telemetry\") |> range(start: v.timeRangeStart, stop: v.timeRangeStop) |> filter(fn:(r)=> r._measurement == \"coresync_knots\" and r._field == \"result\" and r.streamKey == \"P-101:flow:measured:upstream:FT-101\") |> map(fn:(r)=>({ r with _field: \"knot (m³/h)\" }))\nunion(tables:[raw, knots])"
        }
      ]
    },
    {
      "id": 7,
      "type": "timeseries",
      "title": "Pressure downstream — raw 0.5 Hz vs CoreSync knots (mbar)",
      "datasource": { "type": "influxdb", "uid": "cdzg44tv250jkd" },
      "description": "P-101 simulated downstream pressure raw values vs CoreSync knots for the same stream. Pressure cycles every 2 s; knots should appear at each direction change plus every 15 s flush.",
      "gridPos": { "h": 10, "w": 12, "x": 12, "y": 20 },
      "fieldConfig": {
        "defaults": {
          "color": { "mode": "palette-classic" },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "mbar",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 5,
            "gradientMode": "none",
            "hideFrom": { "legend": false, "tooltip": false, "viz": false },
            "lineInterpolation": "linear",
            "lineWidth": 1,
            "pointSize": 3,
            "scaleDistribution": { "type": "linear" },
            "showPoints": "auto",
            "spanNulls": true,
            "stacking": { "group": "A", "mode": "none" },
            "thresholdsStyle": { "mode": "off" }
          },
          "mappings": [],
          "thresholds": { "mode": "absolute", "steps": [{ "color": "green", "value": null }] },
          "unit": "pressuremb"
        },
        "overrides": [
          {
            "matcher": { "id": "byName", "options": "knot (mbar)" },
            "properties": [
              { "id": "custom.drawStyle", "value": "points" },
              { "id": "custom.pointSize", "value": 10 },
              { "id": "custom.showPoints", "value": "always" },
              { "id": "color", "value": { "mode": "fixed", "fixedColor": "#d64545" } }
            ]
          },
          {
            "matcher": { "id": "byName", "options": "raw (mbar)" },
            "properties": [
              { "id": "custom.lineWidth", "value": 2 },
              { "id": "color", "value": { "mode": "fixed", "fixedColor": "#1f6feb" } }
            ]
          }
        ]
      },
      "options": {
        "legend": { "calcs": ["lastNotNull", "count", "min", "max"], "displayMode": "table", "placement": "bottom", "showLegend": true },
        "tooltip": { "mode": "multi", "sort": "none" }
      },
      "targets": [
        {
          "refId": "A",
          "datasource": { "type": "influxdb", "uid": "cdzg44tv250jkd" },
          "query": "raw = from(bucket:\"telemetry\") |> range(start: v.timeRangeStart, stop: v.timeRangeStop) |> filter(fn:(r)=> r._measurement == \"rotatingmachine_cse_rm_pump\" and r._field == \"pressure.measured.downstream.dashboard-sim-downstream\") |> map(fn:(r)=>({ r with _field: \"raw (mbar)\" }))\nknots = from(bucket:\"telemetry\") |> range(start: v.timeRangeStart, stop: v.timeRangeStop) |> filter(fn:(r)=> r._measurement == \"coresync_knots\" and r._field == \"result\" and r.streamKey == \"p-101:pressure:measured:downstream:dashboard-sim-downstream\") |> map(fn:(r)=>({ r with _field: \"knot (mbar)\" }))\nunion(tables:[raw, knots])"
        }
      ]
    },
    {
      "id": 400,
      "type": "row",
      "title": "Diagnostics — why CoreSync chose to emit (or not)",
      "gridPos": { "h": 1, "w": 24, "x": 0, "y": 30 },
      "collapsed": false,
      "panels": []
    },
    {
      "id": 8,
      "type": "timeseries",
      "title": "Knot interarrival time per stream (seconds since previous knot)",
      "datasource": { "type": "influxdb", "uid": "cdzg44tv250jkd" },
      "description": "Time between successive knots per stream. A stream emitting a knot every tick (~1 s) is not compressing. A healthy stream shows seconds-to-tens-of-seconds between knots and a hard cap at 15 s (the flush interval).",
      "gridPos": { "h": 8, "w": 12, "x": 0, "y": 31 },
      "fieldConfig": {
        "defaults": {
          "color": { "mode": "palette-classic" },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "s",
            "axisPlacement": "auto",
            "drawStyle": "points",
            "fillOpacity": 0,
            "gradientMode": "none",
            "hideFrom": { "legend": false, "tooltip": false, "viz": false },
            "lineWidth": 0,
            "pointSize": 4,
            "scaleDistribution": { "type": "log", "log": 10 },
            "showPoints": "always",
            "spanNulls": false,
            "thresholdsStyle": { "mode": "line+area" }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              { "color": "transparent", "value": null },
              { "color": "rgba(214, 69, 69, 0.15)", "value": 0 },
              { "color": "transparent", "value": 2 }
            ]
          },
          "unit": "s",
          "min": 0.1
        },
        "overrides": []
      },
      "options": {
        "legend": { "calcs": ["mean", "min", "max"], "displayMode": "table", "placement": "bottom", "showLegend": true },
        "tooltip": { "mode": "multi", "sort": "none" }
      },
      "targets": [
        {
          "refId": "A",
          "datasource": { "type": "influxdb", "uid": "cdzg44tv250jkd" },
          "query": "from(bucket:\"telemetry\") |> range(start: v.timeRangeStart, stop: v.timeRangeStop) |> filter(fn:(r)=> r._measurement == \"coresync_knots\" and r._field == \"knot\") |> drop(columns:[\"_value\"]) |> group(columns:[\"streamKey\"]) |> sort(columns:[\"_time\"]) |> elapsed(unit:1ms, columnName:\"_value\") |> map(fn:(r)=>({ r with _value: float(v:r._value) / 1000.0 })) |> filter(fn:(r)=> r._value > 0.0)"
        }
      ]
    },
    {
      "id": 9,
      "type": "table",
      "title": "Compression health — full math per stream (knots ÷ raw = kept; 1 − kept = saved)",
      "datasource": { "type": "influxdb", "uid": "cdzg44tv250jkd" },
      "description": "Each row shows every number that goes into the compression decision so the math is verifiable in your head. 'kept' is the inverse of the reductionPct in the table above (knots/raw). 'savedPct' equals reductionPct in the per-stream table — same number, different visualization. Verify: kept + savedPct/100 ≈ 1 for every row.",
      "gridPos": { "h": 8, "w": 12, "x": 12, "y": 31 },
      "fieldConfig": {
        "defaults": {
          "custom": { "align": "auto", "cellOptions": { "type": "auto" }, "inspect": false },
          "color": { "mode": "thresholds" }
        },
        "overrides": [
          {
            "matcher": { "id": "byName", "options": "kept" },
            "properties": [
              { "id": "unit", "value": "percentunit" },
              { "id": "decimals", "value": 3 },
              { "id": "min", "value": 0 },
              { "id": "max", "value": 1 },
              { "id": "custom.cellOptions", "value": { "type": "gauge", "mode": "gradient", "valueDisplayMode": "color" } },
              {
                "id": "thresholds",
                "value": {
                  "mode": "absolute",
                  "steps": [
                    { "color": "#2f9e44", "value": null },
                    { "color": "#e8a23a", "value": 0.30 },
                    { "color": "#d64545", "value": 0.50 }
                  ]
                }
              }
            ]
          },
          {
            "matcher": { "id": "byName", "options": "savedPct" },
            "properties": [
              { "id": "unit", "value": "percent" },
              { "id": "decimals", "value": 1 },
              { "id": "custom.cellOptions", "value": { "type": "color-background", "mode": "gradient" } },
              {
                "id": "thresholds",
                "value": {
                  "mode": "absolute",
                  "steps": [
                    { "color": "#d64545", "value": null },
                    { "color": "#e8a23a", "value": 50 },
                    { "color": "#2f9e44", "value": 70 }
                  ]
                }
              }
            ]
          },
          {
            "matcher": { "id": "byName", "options": "raw" },
            "properties": [{ "id": "unit", "value": "short" }, { "id": "decimals", "value": 0 }]
          },
          {
            "matcher": { "id": "byName", "options": "knots" },
            "properties": [{ "id": "unit", "value": "short" }, { "id": "decimals", "value": 0 }]
          }
        ]
      },
      "options": {
        "cellHeight": "sm",
        "footer": { "countRows": false, "fields": "", "reducer": ["sum"], "show": false },
        "showHeader": true,
        "sortBy": [{ "desc": true, "displayName": "kept" }]
      },
      "targets": [
        {
          "refId": "A",
          "datasource": { "type": "influxdb", "uid": "cdzg44tv250jkd" },
          "query": "import \"join\"\n\nraw_ft101 = from(bucket:\"telemetry\") |> range(start: v.timeRangeStart, stop: v.timeRangeStop) |> filter(fn:(r)=> r._measurement == \"FROST Flow Sensor FT-101\" and r._field == \"mAbs\") |> count() |> keep(columns:[\"_value\"]) |> map(fn:(r)=>({ streamKey:\"P-101:flow:measured:upstream:FT-101\", raw:r._value }))\nraw_rm_pdn = from(bucket:\"telemetry\") |> range(start: v.timeRangeStart, stop: v.timeRangeStop) |> filter(fn:(r)=> r._measurement == \"rotatingmachine_cse_rm_pump\" and r._field == \"pressure.measured.downstream.dashboard-sim-downstream\") |> count() |> keep(columns:[\"_value\"]) |> map(fn:(r)=>({ streamKey:\"p-101:pressure:measured:downstream:dashboard-sim-downstream\", raw:r._value }))\nraw_rm_pup = from(bucket:\"telemetry\") |> range(start: v.timeRangeStart, stop: v.timeRangeStop) |> filter(fn:(r)=> r._measurement == \"rotatingmachine_cse_rm_pump\" and r._field == \"pressure.measured.upstream.dashboard-sim-upstream\") |> count() |> keep(columns:[\"_value\"]) |> map(fn:(r)=>({ streamKey:\"p-101:pressure:measured:upstream:dashboard-sim-upstream\", raw:r._value }))\nraw_rm_eff = from(bucket:\"telemetry\") |> range(start: v.timeRangeStart, stop: v.timeRangeStop) |> filter(fn:(r)=> r._measurement == \"rotatingmachine_cse_rm_pump\" and r._field == \"efficiency.predicted.atequipment.cse_rm_pump\") |> count() |> keep(columns:[\"_value\"]) |> map(fn:(r)=>({ streamKey:\"p-101:efficiency:predicted:atequipment:cse_rm_pump\", raw:r._value }))\nraw_rm_cog = from(bucket:\"telemetry\") |> range(start: v.timeRangeStart, stop: v.timeRangeStop) |> filter(fn:(r)=> r._measurement == \"rotatingmachine_cse_rm_pump\" and r._field == \"cog\") |> count() |> keep(columns:[\"_value\"]) |> map(fn:(r)=>({ streamKey:\"p-101:cog:measured:atEquipment:MEASURED-p-101\", raw:r._value }))\nraw_rm_sec = from(bucket:\"telemetry\") |> range(start: v.timeRangeStart, stop: v.timeRangeStop) |> filter(fn:(r)=> r._measurement == \"rotatingmachine_cse_rm_pump\" and r._field == \"specificEnergyConsumption.predicted.atequipment.cse_rm_pump\") |> count() |> keep(columns:[\"_value\"]) |> map(fn:(r)=>({ streamKey:\"p-101:specificenergyconsumption:predicted:atequipment:cse_rm_pump\", raw:r._value }))\n\nraw = union(tables:[raw_ft101, raw_rm_pdn, raw_rm_pup, raw_rm_eff, raw_rm_cog, raw_rm_sec]) |> group(columns:[\"streamKey\"])\n\nknots = from(bucket:\"telemetry\") |> range(start: v.timeRangeStart, stop: v.timeRangeStop) |> filter(fn:(r)=> r._measurement==\"coresync_knots\" and r._field==\"knot\") |> keep(columns:[\"streamKey\",\"_value\"]) |> group(columns:[\"streamKey\"]) |> count(column:\"_value\") |> rename(columns:{_value:\"knots\"})\n\njoin.left(left: raw, right: knots, on: (l, r) => l.streamKey == r.streamKey, as: (l, r) => ({ streamKey: l.streamKey, raw: l.raw, knots: if exists r.knots then r.knots else 0 }))\n  |> map(fn:(r)=> ({ streamKey: r.streamKey, raw: r.raw, knots: r.knots, kept: if r.raw > 0 then float(v:r.knots) / float(v:r.raw) else 0.0, savedPct: if r.raw > 0 then 100.0 * (1.0 - float(v:r.knots) / float(v:r.raw)) else 0.0 }))\n  |> group()\n  |> sort(columns:[\"kept\"], desc:true)"
        }
      ],
      "transformations": [
        {
          "id": "organize",
          "options": {
            "excludeByName": {},
            "indexByName": { "streamKey": 0, "raw": 1, "knots": 2, "kept": 3, "savedPct": 4 },
            "renameByName": {}
          }
        }
      ]
    }
  ],
  "refresh": "5s",
  "schemaVersion": 39,
  "style": "dark",
  "tags": ["EVOLV", "CoreSync", "FROST"],
  "templating": { "list": [] },
  "time": { "from": "now-3m", "to": "now" },
  "timepicker": {},
  "timezone": "",
  "title": "CoreSync FROST Demo",
  "uid": "coresync-frost-demo",
  "version": 2,
  "weekStart": ""
 }
--- a/docs/prd/dashboardapi-graph-aware-grafana-generator.md
+++ b/docs/prd/dashboardapi-graph-aware-grafana-generator.md
@@ -0,0 +1,82 @@
 # dashboardAPI v2 — graph-aware Grafana dashboard generator
 _Date: 2026-05-26 · Owner: R&D · Predecessors: `/grill-me` (in-conversation), [`docs/research/dashboardapi-graph-aware-grafana-generator.md`](../research/dashboardapi-graph-aware-grafana-generator.md)_
 One `dashboardAPI` node in a Node-RED flow auto-generates one Grafana dashboard by walking its child-registration graph, composing per-node-type panel templates, and pushing the result to Grafana via HTTP on every Node-RED deploy.
 ## Problem
 Every EVOLV example flow today carries a hand-authored Node-RED Dashboard tab — the active `pumpingstation-complete-example` flow has 73 `ui-*` nodes (charts, gauges, text widgets, fan-out function nodes) consuming roughly a third of the flow. Every new example replicates this work, and each one diverges in axis ranges, chart configs, and fan-out logic — so the output side is inconsistent across the 10+ example flows we maintain. The same telemetry already lands in InfluxDB via Port 1 of every node, so Grafana could render it natively, but today each Grafana dashboard is hand-authored JSON (`docker/grafana/provisioning/dashboards/pumping-station.json` is the only one that exists, frozen at one node type). Result: R&D spends disproportionate time on dashboard plumbing, examples drift, and Grafana — the better readout — is underused.
 ## Goals
 1. Dropping a `dashboardAPI` node into a flow and deploying produces a complete Grafana dashboard with no hand-authored JSON.
 2. Adding a new EVOLV node *instance* (e.g. a new measurement child) to a flow adds its panels on the next deploy with zero Grafana edits.
 3. Adding a new EVOLV node *type* requires only a panel template fragment under `nodes/dashboardAPI/src/templates/<softwareType>.json` — no changes to the layout engine.
 4. Cross-example consistency: every example flow's Grafana dashboard uses the same panel set, axis conventions, and dashed-bounds rendering for the same node type.
 5. Node-RED Dashboard tab in example flows shrinks to control-only widgets (mode select, operator demand, calibration, signal injection). Target: ≤15 `ui-*` nodes per example flow.
 ## Non-goals
 - Sub-second feedback latency from operator action → Grafana visible state. End-to-end ≤15s is acceptable; faster is not pursued.
 - Preserving manual Grafana edits across regenerations. Dashboards are single-source-of-truth from dashboardAPI; manual edits are clobbered on next deploy.
 - Per-instance dashboard customization through the Grafana UI. Templates are centralized and code-owned.
 - Supporting non-EVOLV (third-party) Node-RED node types as panel sources.
 - Live runtime regeneration (no deploy). Regen fires on Node-RED deploy events only.
 - Operator (plant-staff) UX. Sole user is R&D until further notice.
 - Replacing the InfluxDB write path. dashboardAPI v2 reuses the existing `outputUtils.formatForInflux` + `influxdbFormatter` plumbing unchanged.
 ## Users & scenarios
 Sole user: EVOLV R&D team (Rene, Pim, Janneke, Sjoerd, Dieke, Pieter).
 1. **New example flow from scratch.** When R&D builds a new example for `rotatingMachine-complete`, they assemble the node graph (pumpingStation + 3 pumps + measurements), drop in one dashboardAPI, connect each top-level parent to it, and deploy. A Grafana dashboard at the dashboardAPI's UID appears within seconds, with rows per parent and panels per child following the centralized templates.
 2. **Adding a measurement to an existing flow.** When R&D wires a new measurement node as a child of an existing pumpingStation in `pumpingstation-complete-example` and redeploys, the corresponding pump panel gains a `measured` series next to its `predicted` series. No Grafana edit.
 3. **Adding a new EVOLV node type.** When R&D ships a new node type `mixer`, they author `nodes/dashboardAPI/src/templates/mixer.json` (Grafana panel fragment with `${nodeName}` substitution tokens) and bump dashboardAPI's package version. Existing dashboardAPI instances pick up mixer-typed children on next deploy.
 ## Requirements
 ### Functional
 1. **F-1.** `dashboardAPI` shall subscribe to `RED.events.on('flows:started')` and, on each event, inspect `payload.diff` to determine whether any of its own subtree (the dashboardAPI node, its registered children, their registered grandchildren) was affected. If yes, regenerate the dashboard. If no, no-op.
 2. **F-2.** On regenerate, `dashboardAPI` shall walk its registered children via `ChildRegistrationUtils.getAllChildren()`, recurse one level per registered child to discover grandchildren, and produce an ordered list `[{softwareType, nodeName, position, children: [...]}, ...]`.
 3. **F-3.** For each node in the graph, `dashboardAPI` shall load the matching template at `nodes/dashboardAPI/src/templates/${softwareType}.json` and substitute the placeholders `${nodeName}`, `${nodeId}`, `${parentName}`, `${dashboardUid}` and any child-list placeholders into the panel JSON.
 4. **F-4.** The layout engine shall compose templates into a single Grafana dashboard JSON with: one row per top-level child of dashboardAPI; nested rows for grandchildren; sequential `gridPos.y` offsets so panels don't overlap.
 5. **F-5.** Parent panels shall **not** repeat metrics that any of their children's templates already emit. The template format declares each panel's `emittedFields` so the composer can filter duplicates from the parent's panel set.
 6. **F-6.** For each child node of type `rotatingMachine`, the panel set shall include: `%control`, `flow`, `delta P`, any registered measurement child's measured values, and `efficiency`. Where the node config exposes operating bounds (e.g. min/max flow), those bounds shall be rendered as dashed reference lines (`fieldConfig.custom.lineStyle = {fill: "dash", dash: [10,10]}` via a `byName` override) on the same panel as the act value.
 7. **F-7.** For each child of type `measurement` registered to a parent that also emits a `predicted` series for the same quantity, the dashboard shall render two panels side by side (predicted left, measured right). If only `predicted` exists, render the predicted panel only. If only `measured` exists, render the measured panel only.
 8. **F-8.** `dashboardAPI` shall POST the assembled dashboard to `POST {grafanaUrl}/api/dashboards/db` with body `{dashboard: <json>, overwrite: true, folderUid: <configured>}`, using the configured bearer token in `Authorization: Bearer <token>`. The `dashboard.uid` shall be deterministic from the dashboardAPI node's Node-RED id.
 9. **F-9.** On a successful upsert (HTTP 200), `dashboardAPI` shall log the dashboard URL at info level. On failure (non-2xx, timeout, network error), it shall log at error level with the response body and shall **not** retry; the next deploy is the retry mechanism.
 10. **F-10.** Each node emitting a value with operating bounds shall write the bounds as additional Influx fields named `<field>.min` and `<field>.max` alongside `<field>` itself. The dashed-line override matches these by suffix.
 11. **F-11.** The bearer token shall be stored as a Node-RED encrypted credential, not as a plain `defaults` field. On node startup, if the legacy plain field exists, it is migrated to the credential store and the plain field is cleared, with one info-level log line per migrated instance.
 12. **F-12.** `dashboardAPI` shall expose `msg.topic == "regenerate-dashboard"` as a manual trigger that bypasses the diff check and forces a regenerate.
 ### Non-functional
 - **N-1. Performance.** Dashboard composition (graph walk + template merge + JSON build, excluding HTTP roundtrip) shall complete in <500ms for a flow with up to 50 registered children.
 - **N-2. Idempotency.** Running the regenerate path twice in a row with no intervening graph change produces a byte-identical dashboard JSON.
 - **N-3. Security.** The bearer token shall never appear in any log line, status update, debug output, or admin endpoint response. Token-bearing HTTP requests shall set TLS verification on when the configured Grafana URL is `https://`.
 - **N-4. Observability.** Every regenerate emits a structured log line via the `logger` shared utility with fields: `dashboardUid`, `childCount`, `grandchildCount`, `compositionDurationMs`, `httpStatus`, `outcome ∈ {success, http-error, network-error, no-diff}`.
 - **N-5. Backward compatibility.** Existing dashboardAPI instances continue to write to InfluxDB exactly as before. The Grafana-push path is additive and disabled if no `grafanaUrl` is configured.
 ## Constraints & dependencies
 - **Grafana version pinned.** `docker-compose.yml` shall pin to `grafana/grafana:11.3.0` (or whatever specific minor exists at first-issue time) instead of `latest`. The legacy `POST /api/dashboards/db` endpoint is the target; the Grafana 12 Kubernetes-style API is out of scope. This resolves research **O-3**.
 - **Node-RED runtime events.** Depends on `RED.events.on('flows:started')` firing with a `payload.diff` shape (added/changed/removed arrays) — undocumented but stable in current Node-RED versions. Verified by prototype before first issue ships.
 - **InfluxDB write path unchanged.** Reuses existing `outputUtils.formatForInflux` + `influxdbFormatter`. No schema migration to existing telemetry.
 - **Tag schema.** Every Influx field used by a panel must be in the existing emission convention (`_measurement = nodeName`, `_field = type.variant.position.childId`).
 - **Scaffolding to reuse:** `ChildRegistrationUtils.getAllChildren()` (`nodes/generalFunctions/src/helper/childRegistrationUtils.js:104-106`), `extractChildren()` (`nodes/dashboardAPI/src/specificClass.js:151-163`), `grafanaUpsertUrl()` (`:107-110`, URL builder exists, HTTP send missing), `BaseNodeAdapter` lifecycle pattern.
 - **No new npm dependencies** for the HTTP path. Use Node's built-in `https`/`http` modules.
 ## Success metrics
 1. **Hand-authored Grafana JSON in repo = 0.** Measured by counting JSON files in `docker/grafana/provisioning/dashboards/` minus the dynamically-uploaded ones. Current: 2 (pumping-station.json, coresync-frost-demo.json). Target after rollout: 0 file-based, N dynamic.
 2. **`ui-*` node count per example flow ≤ 15** (down from 73 in the current `pumpingstation-complete-example`). Measured by grepping `examples/*.flow.json` after migration.
 3. **Time-to-first-dashboard for a new example flow ≤ 1 minute of human work** (drop in dashboardAPI, configure URL + token, deploy). Measured by stopwatch on the next example flow that gets built.
 4. **Regression coverage:** every example flow's dashboard URL returns HTTP 200 and renders without panel errors. Measured by an integration test that hits the Grafana API after deploying each example.
 ## Open questions
 - **O-1. `flows:started` + `diff` reliability across deploy modes.** Source-readable but needs a spike to confirm `diff` cleanly distinguishes "this dashboardAPI's subtree changed" from "an unrelated flow changed", across `full` / `nodes` / `flows` deploy types. → Resolved by `/prototype` before issue I-3 (the lifecycle hook issue) starts.
 - **O-2. Dashed-line `custom.lineStyle` rendering against real Influx series.** Open Grafana bugs [#75259](https://github.com/grafana/grafana/issues/75259) and [#86546](https://github.com/grafana/grafana/issues/86546) may affect us. → Resolved by `/prototype` before issue I-5 (rotatingMachine template) starts.
 - **O-5 (new).** Folder UID handling — does dashboardAPI assume a single Grafana folder for all generated dashboards (configured per-instance), or create per-flow folders? Default: per-instance configured folder UID, optional. If empty, dashboards land in the General folder. → Owner: R&D, deadline: before I-4.
 ## Out of scope (v2 candidates)
 - Per-instance panel customization through the Grafana UI with merge-on-regen.
 - Operator-facing UX (Grafana role/permission management, embedded dashboards in Node-RED).
 - Auto-discovery of measurement units / axis ranges from node config schemas.
 - Multi-Grafana-instance fanout (push the same dashboard to staging + prod).
 - Grafana alerts / notification policies generated from EVOLV alarm definitions.
 - Dashboard versioning / rollback inside Grafana.
 - Template fragments living next to their owning node (decentralized template discovery).
--- a/docs/research/dashboardapi-graph-aware-grafana-generator.md
+++ b/docs/research/dashboardapi-graph-aware-grafana-generator.md
@@ -0,0 +1,56 @@
 # Research brief: graph-aware Grafana dashboard generator in dashboardAPI
 _Date: 2026-05-26_
 _Context: follows `/grill-me` session that locked design constraints; feeds into `/prd`._
 ## Questions
 1. Node-RED lifecycle: how does a custom node reliably detect "deploy complete" across deploy types?
 2. Prior art: existing Node-RED → Grafana auto-dashboard generators
 3. Grafana HTTP API: idempotent dashboard updates by UID, version conflicts, RBAC
 4. Dynamic min/max envelope pattern: dashed reference lines that vary over time
 5. EVOLV-internal scaffolding already in place
 ## Design constraints already settled in `/grill-me`
 1. dashboardAPI = dashboard **generator**, not just an InfluxDB writer.
 2. One dashboardAPI instance = one Grafana dashboard. Multiple instances coexist.
 3. Single source of truth: regen on Node-RED deploy **clobbers** manual Grafana edits.
 4. Trigger: HTTP API push from dashboardAPI to Grafana, fired on Node-RED deploy.
 5. Auth: per-flow Grafana service-account token.
 6. Templates centralized in `nodes/dashboardAPI/src/templates/` per node type.
 7. Per-instance `_measurement` = node name (already in `influxdbFormatter`).
 8. **No data duplication** between parent and child panels (MGC shows group-level only).
 9. Predicted-vs-measured = 2 panels side by side; predicted only when no measured registered.
 10. Per-pump panel set: %control / flow / delta P / measured-from-children / efficiency / dashed dynamic bounds.
 11. Static config bounds → **dashed reference lines** that follow the live operating envelope (top/bottom dashed + act value).
 ## What's already in this codebase
 - **Child registration is fully graph-aware.** `ChildRegistrationUtils` keeps a `Map<id, {child, softwareType, position, registeredAt}>` with type-aware accessors `getAllChildren()`, `getChildById()`, `getChildrenOfType()`. (`nodes/generalFunctions/src/helper/childRegistrationUtils.js:19-106`)
 - **dashboardAPI already iterates its children.** `extractChildren()` reads `nodeSource.childRegistrationUtils.registeredChildren.values()`. (`nodes/dashboardAPI/src/specificClass.js:151-163`)
 - **Grafana upsert URL is already constructed but not yet dispatched.** `grafanaUpsertUrl()` builds the target URL — the HTTP send is missing. (`nodes/dashboardAPI/src/specificClass.js:107-110`)
 - **InfluxDB schema is `measurement: nodeName`, tags from flattened config** (id, softwareType, role, positionVsParent, uuid, tagCode, geoLocation, category, type, model, unit). (`nodes/generalFunctions/src/helper/outputUtils.js:44,99-117`; `formatters/influxdbFormatter.js:12-20`)
 - **Lifecycle hooks: only `node.on('close')` and `node.on('input')` are used.** No EVOLV node currently subscribes to `RED.events.on('flows:started')` or similar — net-new wiring. (`nodes/generalFunctions/src/nodered/BaseNodeAdapter.js:164,184`)
 - **dashboardAPI's bearer token is stored as a plain `defaults` field, NOT as a Node-RED `credentials:` block** — so it's not encrypted at rest today. (`nodes/dashboardAPI/dashboardAPI.html:15-16`; `src/nodeClass.js:38-42`) **Contradicts the grilling assumption** that "the existing InfluxDB credentials path" is already in place — it isn't.
 - **No outbound external HTTPS pattern exists anywhere in EVOLV nodes.** Net-new code path.
 ## External options
 - **Legacy Grafana API (`POST /api/dashboards/db` with `overwrite: true`).** Skips version + uid-uniqueness checks → idempotent. Returns `412 Precondition Failed` on stale version when `overwrite=false`. Minimum RBAC: `dashboards:write` scoped to a folder. ([docs](https://grafana.com/docs/grafana/latest/developers/http_api/dashboard/))
 - **Grafana 12 Kubernetes-style API (`/apis/dashboard.grafana.app/v1/...`).** Returns `409 Conflict` instead of `412`. Newer but couples integration to Grafana 12+.
 - **`flows:started` runtime event** fires on every deploy (full / nodes / flows) with `{type, diff}` payload. De-dupe by inspecting `diff.added/changed/removed`. Runtime events are undocumented — must read source. (Node-RED `packages/.../runtime/lib/flows/index.js`)
 - **`nodes-started` event is deprecated** — use `flows:started`.
 - **Dashed-line dynamic bands:** the *only* path that works today is emitting min/max as separate Influx fields + applying `fieldConfig.overrides[].properties[].id = "custom.lineStyle"` with `{fill: "dash", dash: [10,10]}`. Per-series override via `byName` matcher.
 - **Grafana thresholds are static-only** (open issue [grafana/grafana#115398](https://github.com/grafana/grafana/issues/115398) — Needs Prioritisation). Dead end for time-varying bands.
 ## Prior art
 - **No relevant prior art found.** Every "node-red + grafana" tutorial puts Influx in the middle and hand-builds dashboards. No npm package pushes Grafana dashboards from Node-RED. Greenfield lane.
 - **Grafana Foundation SDK / dashboards-as-code** ([docs](https://grafana.com/docs/grafana/latest/as-code/observability-as-code/foundation-sdk/)) — assumes out-of-band CI generation, not a live Node-RED instance.
 - **Operating-envelope plotting in Grafana** — [community thread 57225](https://community.grafana.com/t/how-to-plot-graph-using-upper-and-lower-bound/57225) asks the exact question, no accepted answer.
 - **Known Grafana bugs around `custom.lineStyle`:** [#75259](https://github.com/grafana/grafana/issues/75259) (transforms) and [#86546](https://github.com/grafana/grafana/issues/86546) (overlapping dashed → solid).
 ## Open unknowns
 - **(O-1) `flows:started` + `diff` reliability.** Does `diff` cleanly distinguish "this dashboardAPI's flow changed" from "an unrelated flow changed" across all three deploy modes? Source-readable but needs an actual spike to verify edge cases (e.g. a `Modified Nodes` deploy that adds a child measurement to a pumpingStation registered to a dashboardAPI in a different tab). → **Candidate for `/prototype`.**
 - **(O-2) Dashed-line rendering against real Influx series.** Two open Grafana bugs ([#75259](https://github.com/grafana/grafana/issues/75259), [#86546](https://github.com/grafana/grafana/issues/86546)) affect `custom.lineStyle`. Untested whether either bites with EVOLV's emission pattern. → **Candidate for `/prototype`.**
 - **(O-3) Legacy `/api/dashboards/db` vs v12 K8s API.** Which to commit to? Locks integration to a Grafana version family. Local stack uses `grafana/grafana:latest` — version drifts on `docker compose pull`. → PRD-time decision; pin Grafana image.
 - **(O-4) Bearer-token storage migration.** Assumption that "follow existing creds pattern" doesn't hold — dashboardAPI stores it as plain config today. Need to migrate to Node-RED `credentials:` block. Risk: token currently sitting in `flow.json` of users' existing flows. → PRD-time decision; migration step in first issue.
 ## Recommended next step
 `/prd` — commit the design, resolve O-3 and O-4 explicitly, and queue O-1 and O-2 for `/prototype` before the first issue ships.
Author	SHA1	Message	Date
lzm	1854431ba3	merge: integrate origin/development into dev-lzm Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-29 15:49:14 +02:00
Rene De Ren	8a5cc552ab	chore(submodule): bump pumpingStation ef07f2a -> 2fb083d Promotes the pumpingStation development tip into the superproject. On top of the ef07f2a already pinned by the 5-node bump, this adds the contract/docs work: - docs(contract): close output-contract gaps — mode/manualDemand, Port-2 topic, output manifest (4889fda) - docs(contract): worked msg examples for every input topic + output port (2fb083d) - fix(ps): persist stopLevel/holdLevel as numbers; flow output in m³/h; etc. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-29 13:59:28 +02:00
znetsixe	14140725bc	chore: workflow artifacts — research brief + dashboardAPI v2 PRD + submodule bumps Bumps machineGroupControl (e1e1977) and pumpingStation (ef07f2a) — example dashboard JSON tweaks committed on each submodule's development branch. Adds docs/research/ and docs/prd/ for the dashboardAPI v2 graph-aware Grafana generator workflow (Gitea issues #32-#43). Ignores .prototypes/ — throwaway spike code lives there per the /prototype skill.	2026-05-26 17:32:20 +02:00
znetsixe	3f84b91afb	chore(submodules): bump 5 nodes — UnitPolicy rollout + buffered fixes Bumps: - rotatingMachine 455f15d refactor: route unit conversions through UnitPolicy.convert - pumpingStation 2d68a4f refactor + fix(level): UnitPolicy adoption, level-rate timestamp fix, integration test rewire - machineGroupControl ddf2b07 refactor: _canonicalToOutputFlow + setDemand via UnitPolicy.convert, structure test rewire - generalFunctions bc79de1 fix(influx): accept tagCode camelCase + emit positionVsParent tag - measurement 36eaa2f test(edge): align with object-payload accept behaviour The UnitPolicy bump finishes the §6 contract migration the refactor plan named (drop _convertUnitValue / hardcoded m3/h<->m3/s scalars in favour of policy.convert at every site). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-23 15:30:45 +02:00
znetsixe	8c3d3ac69a	feat(dashboard): verifiable CoreSync FROST demo + bump coresync submodule Replaces the old 3-panel coresync-frost-demo.json with a 13-panel dashboard designed for at-a-glance verification of CoreSync's compression behaviour. Dashboard rebuild (docker/grafana/provisioning/dashboards/coresync-frost-demo.json): - Header "How to read" text panel: definitions table + sanity checks so every metric is line-of-sight to its Flux source. - Scoreboard row (4 stats): raw samples / CoreSync knots / reduction % / approx. bytes saved over the selected time range. - Per-stream verification table: one row per CoreSync stream with raw, knots, and reductionPct (gradient-coloured). Each line's math is mentally checkable: raw × (1 − reductionPct/100) = knots. - Signal-reconstruction overlays: flow (m³/h) and pressure (mbar) rendered as a thin raw line plus fat red knot points so you can see knots snap to the raw signal at direction changes. Fixes the previous panels which mislabelled both as `flowm3h` regardless of units. - Diagnostics row: per-stream knot-interarrival timeseries and a full-math compression-health table (raw, knots, kept fraction with gradient bar, savedPct with colour background). Bumps coresync submodule to 21d77a8 which lands the FROST demo flow plus the burst-window reducer fix that was driving cog/efficiency/SEC to ~0% compression. Verified end-to-end on the live stack: headline reduction went from 33% to 83%, broken streams from 0.6%-14% to 78%-93%. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-22 20:27:59 +02:00
znetsixe	fcaad8cd9f	chore(skills): add /research and /prototype; rewrite README for 6-skill chain Front-loads gap discovery before /grill-me by adding two skills: research MOSTLY fans out Explore + WebSearch agents in parallel, synthesizes findings into a brief, names open unknowns explicitly (which become /prototype targets) prototype MOSTLY builds a throwaway spike to test ONE falsifiable assumption; code lives in .prototypes/ (gitignored), never promoted; output is evidence — verdict, numbers, observed behavior — that feeds /prd Full chain now: /research → /prototype → /grill-me → /prd → /prd-to-issues → /ship-it Chain rationale: /research and /prototype surface knowledge gaps and falsify risky assumptions while the cost of changing direction is still cheap; the TOGETHER phases (grill-me, prd) lock down the contract; the AFK phase (ship-it) only executes against contracts already on paper. The chain is a default, not a mandate — README covers when to skip upstream skills for small or stack-familiar work. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-21 16:33:26 +02:00
znetsixe	6ff262e96e	chore(skills): add workflow chain — grill-me → prd → prd-to-issues → ship-it Four workflow skills that take a feature from fuzzy idea to merged code. Two human-in-the-loop phases (grill-me, prd), one mostly-together (prd-to-issues files only on explicit 'create'), and one AFK (ship-it). grill-me TOGETHER pressure-test the idea with hard interview questions prd TOGETHER synthesize PRD; gaps stay explicit, not papered over prd-to-issues MOSTLY thin vertical-slice issues with coverage matrix + per-issue Slice check; self-audits before showing ship-it AFK shell loop ships each slice end-to-end with one commit per issue, status streams to terminal, Ctrl-C-able, survives session close Vertical-slice principle throughout: every issue cuts end-to-end through every integration layer (no horizontal "do all the DB work first" issues). The AFK loop only ships against acceptance criteria already locked in by the PRD phase — autonomous code never runs against undefined contracts. ship-it tracker support: gh (GitHub) and tea (Gitea). For this repo, set SHIP_IT_TRUNK=development to override the main default. See .claude/skills/README.md for the full how-to and a worked example. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-21 16:27:15 +02:00