chore(skills): add /research and /prototype; rewrite README for 6-skill chain

Front-loads gap discovery before /grill-me by adding two skills: research MOSTLY fans out Explore + WebSearch agents in parallel, synthesizes findings into a brief, names open unknowns explicitly (which become /prototype targets) prototype MOSTLY builds a throwaway spike to test ONE falsifiable assumption; code lives in .prototypes/ (gitignored), never promoted; output is evidence — verdict, numbers, observed behavior — that feeds /prd Full chain now: /research → /prototype → /grill-me → /prd → /prd-to-issues → /ship-it Chain rationale: /research and /prototype surface knowledge gaps and falsify risky assumptions while the cost of changing direction is still cheap; the TOGETHER phases (grill-me, prd) lock down the contract; the AFK phase (ship-it) only executes against contracts already on paper. The chain is a default, not a mandate — README covers when to skip upstream skills for small or stack-familiar work. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 16:33:26 +02:00
parent 6ff262e96e
commit fcaad8cd9f
3 changed files with 265 additions and 93 deletions
--- a/.claude/skills/README.md
+++ b/.claude/skills/README.md
@@ -1,132 +1,127 @@
-# Workflow skills — grill-me → prd → prd-to-issues → ship-it
+# Workflow skills — research → prototype → grill-me → prd → prd-to-issues → ship-it

-A four-skill chain that takes a vague feature idea from "I want to build X" to merged, end-to-end-verified code. Two phases are collaborative (you + Claude), two run autonomously (AFK).
+A six-skill chain that takes a vague idea from "I wonder if we could…" to merged, end-to-end-verified code. Four collaborative phases up front to lock down what's being built; two largely-autonomous phases that execute against that contract.

 ```
-/grill-me <topic>      TOGETHER   pressure-test the idea
+/research <topic>      MOSTLY      external + repo knowledge into a brief
       ↓
-/prd                   TOGETHER   synthesize a PRD; gaps stay explicit
+/prototype <claim>     MOSTLY      throwaway spike to test the riskiest assumption
       ↓
-/prd-to-issues         MOSTLY     thin vertical-slice issues; file when you say "create"
+/grill-me <topic>      TOGETHER    pressure-test what survived
       ↓
-/ship-it               AFK        shell loop ships every slice end-to-end
+/prd                   TOGETHER    synthesize PRD; gaps stay explicit
+       ↓
+/prd-to-issues         MOSTLY      thin vertical-slice issues; file on "create"
+       ↓
+/ship-it               AFK         shell loop ships every slice end-to-end
 ```

-## The TOGETHER vs AFK distinction
+You don't have to use every skill on every feature. Small tweaks may skip `/research` and `/prototype`. Bigger / novel work uses the whole chain.
+
+## Mode taxonomy

 | Mode | Meaning | Skills |
 |---|---|---|
-| **TOGETHER** | Needs your judgment every turn. No autonomous path. | `grill-me` |
-| **MOSTLY TOGETHER** | Drafts/audits AFK, but a visible-to-team action needs your "go". | `prd`, `prd-to-issues` |
-| **AFK** | No human in the loop. Logs questions to issues instead of asking. | `ship-it` |
+| **TOGETHER** | Needs your turn-by-turn judgment. No autonomous path. | grill-me |
+| **MOSTLY TOGETHER** | Drafts / fetches / builds AFK. You review the output. Any visible-to-team action needs your explicit "go". | research, prototype, prd, prd-to-issues |
+| **AFK** | No human in the loop. Logs questions to issues instead of asking. | ship-it |

-Each skill's `SKILL.md` declares its mode in the first body line. The chain is designed so the AFK phase only starts after the human-locked phases have nailed down the contract (PRD requirements, slice acceptance criteria) — AFK code only ships against decisions that already exist on paper.
+The chain is structured so AFK execution only starts after the human-locked phases have nailed down the contract. Autonomous code never runs against undefined contracts.

 ---

 ## When to use each

+### `/research <topic>` — MOSTLY TOGETHER
+Fans out Explore + WebSearch agents in parallel, synthesizes findings into a research brief, and names open unknowns explicitly (which become candidates for `/prototype`).
+
+**Use when:** the topic touches anything you haven't done before in this codebase. Novel libraries, unfamiliar patterns, "how do others solve this".
+
+**Don't use it for:** stuff you already understand. The point is to fetch what you don't know, not summarize what you do.
+
+### `/prototype <claim>` — MOSTLY TOGETHER
+Builds a throwaway spike to test ONE falsifiable assumption. Code lives in `.prototypes/` (gitignored) and is never promoted to the main codebase. Output is evidence — verdict, numbers, observed behavior — that feeds the PRD.
+
+**Use when:** `/research` surfaced an Open Unknown that "we'll find out when we build it". Better to find out for an hour of spike cost than a week of half-built feature.
+
+**Don't use it for:** building a "lightweight v0" you secretly plan to evolve. Prototypes are evidence; production code is the real implementation. The skill rejects scope creep mid-spike.
+
 ### `/grill-me <topic>` — TOGETHER
+Senior staff engineer running a brutal-but-fair interview. One hard question at a time, honest critique (no praise filler), drills into weak spots. Stay on topic until exhausted; say `stop` for an honest 3-bullet debrief.

-**Use when:** you have a fuzzy idea and want it pressure-tested before committing to scope. Also useful for interview prep on any technical topic.
+**Use when:** `/research` and `/prototype` (if used) have built up enough context that you need to pressure-test your own thinking before locking it in a PRD.

-**Behavior:** acts as a senior staff engineer running a brutal-but-fair interview. One hard question at a time, honest critique (no praise filler), drills into weak spots, demands specifics over buzzwords. Stay on the topic until exhausted; say `stop` for an honest 3-bullet debrief.
-
-**Don't use it for:** rubber-stamping a finished idea, or when you want validation. It's designed to find gaps, not to agree with you.
+**Don't use it for:** rubber-stamping a finished idea, or when you want validation. Designed to find gaps, not to agree.

 ### `/prd` — TOGETHER (drafts AFK after grilling)
+Engineering PRD: Problem, Goals, Non-goals, Users & scenarios, Functional + Non-functional requirements, Constraints, Success metrics, Open Questions, Out of scope. Things you nailed in grilling become firm requirements. Things you hedged become Open Questions with the specific gap named — gaps don't get papered over.

-**Use when:** the grilling exposed enough that you're ready to lock down what's being built. Or standalone when you already know the feature shape.
+**Use when:** the grilling exposed enough that the feature shape is clear. Or standalone when you already have full context.

-**Behavior:** writes an engineering PRD (Problem, Goals, Non-goals, Users & scenarios, Functional + Non-functional requirements, Constraints, Success metrics, Open Questions, Out of scope). Things you nailed in the grilling become firm requirements. Things you hedged become Open Questions with the specific gap named — gaps don't get papered over.
-
-**Output:** inline by default. Say "save it" → writes to `docs/prd/<name>.md`.
-
-**Don't use it for:** strategy docs, market sizing, or "why now" sections. This is for engineering.
+**Don't use it for:** strategy decks, market sizing, "why now". This is for engineering.

 ### `/prd-to-issues` — MOSTLY TOGETHER
+Breaks the PRD into **thin vertical slices** — each issue cuts end-to-end through every integration layer (schema → service → API → UI → tests; or sensor → broker → parser → store → dashboard). First slice is a walking skeleton. Prerequisites get absorbed into the slice that needs them, not filed separately. Per-issue `Slice check` block proves every layer is covered, plus a coverage matrix at the top of the draft showing PRD → issue mapping. Self-audit runs **before** the draft is shown to you.

-**Use when:** you have a PRD (just-drafted or path-pointed) and need a backlog of issues a team can pick up.
+**Output:** draft inline → you reply `create` → files to the tracker (`gh` for GitHub, `tea` for Gitea).

-**Behavior:** breaks the PRD into **thin vertical slices** — each issue cuts end-to-end through every integration layer (schema → service → API → UI → tests; or sensor → broker → parser → store → dashboard). The first slice is a walking skeleton. Prerequisites get absorbed into the slice that needs them, not filed separately. Each issue has a visible `Slice check` block proving every layer is covered, plus a coverage matrix at the top of the draft showing PRD → issue mapping. Self-audit runs **before** drafting is shown to you.
-
-**Output:** draft inline → you reply `create` → it files to the tracker (`gh` for GitHub, `tea` for Gitea).
-
-**Don't use it for:** horizontal task lists ("DB work", "API work", "frontend work"). The skill rejects layer-cake slicing — if your work needs that, you want a different tool.
+**Don't use it for:** horizontal task lists ("DB work", "API work", "frontend work"). The skill rejects layer-cake slicing.

 ### `/ship-it` — AFK
-
-**Use when:** issues are filed, you want to walk away, you'll review the PRs later.
-
-**Behavior:** runs a shell loop that picks the next ready issue, dispatches a fresh headless Claude to ship it end-to-end (failing e2e test first → implement layer by layer → full suite → outermost-layer smoke check → commit `Closes #N` → PR with checked acceptance criteria + smoke evidence → CI gate → merge or leave-for-review), then moves on. One commit per issue. Status streams to the terminal; you can `tail -f` the log from another shell or Ctrl-C anytime.
+Shell loop in `.claude/skills/ship-it/loop.sh`. Picks the next ready issue, dispatches a fresh headless Claude to ship it end-to-end (failing e2e test first → implement layer by layer → full suite → outermost-layer smoke check → commit `Closes #N` → PR with acceptance-criteria checkboxes + smoke evidence → CI gate → merge or leave-for-review), then moves on. One commit per issue. Status streams to terminal; tail logs from another shell; Ctrl-C anytime.

 Undecidable issues get labeled `needs-decision` and skipped. Three consecutive failures stops the loop for human review.

-**Don't use it for:** issues whose acceptance criteria aren't testable (the loop will skip them), or for slices that need real-world side effects with no test harness.
+**Don't use it for:** issues whose acceptance criteria aren't testable. The loop will skip them.

 ---

 ## A worked example

-You want to add live sensor display for a new flow meter on a dashboard.
+Adding live sensor display for a new flow meter to the operator dashboard.

 ```bash
-# 1. Pressure-test the idea
+# 1. Fetch what we don't already know
+/research adding live flow-meter readings to the operator dashboard
+# Brief lands; surfaces an Open Unknown:
+#   "Can Node-RED sustain 1Hz updates across 12 dashboard panels for 10 min
+#    straight without dropping frames?"
+
+# 2. Test the risky assumption
+/prototype Node-RED can stream 1Hz updates to 12 Grafana panels for 10 min straight
+# Spike runs in .prototypes/nodered-throughput/;
+# Verdict: confirmed, 14% CPU peak. Evidence captured. Prototype stays gitignored.
+
+# 3. Pressure-test the design
 /grill-me adding live flow-meter readings to the operator dashboard
+# 6–8 hard questions; surfaces gaps in alerting and missing-data handling.

-# (you answer 6-8 hard questions; gaps surface)
-
-# 2. Lock down the contract
+# 4. Lock down the contract
 /prd
+# PRD drafts with the alerting decision as a firm requirement and the
+# missing-data behavior as an explicit Open Question.

-# (PRD drafts; you edit one section; save it)
-
-# 3. Slice it
+# 5. Slice it
 /prd-to-issues
+# 5 slices; coverage matrix confirms every PRD requirement maps to a slice.
+# Reply `create` → issues #142..#146 filed.

-# (draft shows ~5 slices, each with a Slice check block. Coverage matrix
-# confirms every PRD requirement maps to a slice. Reply `create`.)
-# → issues #142..#146 filed
-
-# 4. Walk away
+# 6. Walk away
 /ship-it
-
-# (preflight, plan, "Start? Reply `go`." → `go` → shell loop runs)
-# In another terminal: tail -f .ship-it-logs/run-*.log
+# Preflight, plan, "Start? Reply `go`." → `go` → shell loop runs.
+# Another terminal:  tail -f .ship-it-logs/run-*.log
 ```

-After the loop exits, the summary tells you what shipped, what's open for review, what hit `needs-decision`.
+After ship-it exits, the summary tells you what shipped, what's open for review, what hit `needs-decision`.

---
+## Skipping skills

-## Configuration
+The chain is a default, not a mandate:

-### Repo trunk branch
-
-`ship-it` defaults to `main`. If your repo uses something else:
-
-```bash
-SHIP_IT_TRUNK=development bash .claude/skills/ship-it/loop.sh
-```
-
-### Other env vars
-
-| Var | Default | Purpose |
-|---|---|---|
-| `SHIP_IT_MAX` | 50 | Iteration cap |
-| `SHIP_IT_MAX_FAIL` | 3 | Consecutive failures before stop |
-| `SHIP_IT_TIMEOUT` | 30m | Per-issue timeout |
-| `SHIP_IT_LOG_DIR` | `<repo>/.ship-it-logs` | Log directory |
-
-### Tracker support
-
- **GitHub** — uses `gh` CLI (must be installed and authenticated: `gh auth status`).
- **Gitea** — uses `tea` CLI (must be installed: `go install code.gitea.io/tea@latest`, then `tea login add`).
- Auto-detected from `git remote get-url origin`.
-
-### Issue label expected by `ship-it`
-
-The loop filters to open issues with label `slice` and without labels `blocked`, `needs-decision`, or `ci-failed`. `/prd-to-issues` applies the `slice` label by default. If you file issues by hand, add the label or `ship-it` won't pick them up.
+- **Tiny well-understood change:** straight to `/prd-to-issues` (or file the issue by hand and run `/ship-it`).
+- **Bigger but stack-familiar:** skip `/research` and `/prototype`; start at `/grill-me`.
+- **Pure research, no implementation yet:** stop after `/research` or `/prototype` — the brief or findings are the deliverable.
+- **Existing PRD from somewhere else:** `/prd-to-issues <path>` and go.

 ---

@@ -134,7 +129,11 @@ The loop filters to open issues with label `slice` and without labels `blocked`,

 ```
 .claude/skills/
-├── README.md                  ← this file
+├── README.md                ← this file
+├── research/
+│   └── SKILL.md
+├── prototype/
+│   └── SKILL.md
 ├── grill-me/
 │   └── SKILL.md
 ├── prd/
@@ -142,39 +141,77 @@ The loop filters to open issues with label `slice` and without labels `blocked`,
 ├── prd-to-issues/
 │   └── SKILL.md
 └── ship-it/
-    ├── SKILL.md               ← entry point; chat-side bootstrap
-    ├── loop.sh                ← orchestrator (the actual loop)
-    └── iterate.md             ← per-issue prompt the loop dispatches
+    ├── SKILL.md             ← entry point; chat-side bootstrap
+    ├── loop.sh              ← orchestrator (the actual loop)
+    └── iterate.md           ← per-issue prompt the loop dispatches
+
+.prototypes/                 ← throwaway spike code (gitignored, created by /prototype)
+.ship-it-logs/               ← ship-it loop logs (recommend gitignoring)
+docs/
+├── research/                ← saved research briefs ("save it" in /research)
+└── prd/                     ← saved PRDs ("save it" in /prd)
 ```

 ---

+## Configuration
+
+### ship-it tracker support
+- **GitHub** — `gh` CLI required (`gh auth status`)
+- **Gitea** — `tea` CLI required (`go install code.gitea.io/tea@latest && tea login add`)
+- Auto-detected from `git remote get-url origin`
+
+### ship-it env vars
+
+| Var | Default | Purpose |
+|---|---|---|
+| `SHIP_IT_TRUNK` | `main` | Trunk branch (set to `development` for the EVOLV repo) |
+| `SHIP_IT_MAX` | 50 | Iteration cap |
+| `SHIP_IT_MAX_FAIL` | 3 | Consecutive failures before stop |
+| `SHIP_IT_TIMEOUT` | 30m | Per-issue timeout |
+| `SHIP_IT_LOG_DIR` | `<repo>/.ship-it-logs` | Log directory |
+
+Example for EVOLV:
+```bash
+SHIP_IT_TRUNK=development bash .claude/skills/ship-it/loop.sh
+```
+
+### Issue label expected by ship-it
+The loop filters to open issues with label `slice` and without `blocked`, `needs-decision`, or `ci-failed`. `/prd-to-issues` applies `slice` by default. If you file issues by hand, add the label or ship-it won't pick them up.
+
+---
+
 ## Troubleshooting

 **`ship-it` won't start: "tea CLI not installed".**
-The repo's remote is Gitea but you don't have `tea`. Install it (`go install code.gitea.io/tea@latest && tea login add`) or push to a GitHub mirror.
+Repo remote is Gitea but you don't have `tea`. Install it (`go install code.gitea.io/tea@latest && tea login add`) or push to a GitHub mirror.

 **`ship-it` exits immediately: "git tree is dirty".**
-Commit or stash everything before running. The loop won't risk mixing your work-in-progress into a slice.
+Commit or stash before running. The loop won't risk mixing WIP into a slice.

 **`ship-it` says "backlog empty" but I have open issues.**
-The filter requires label `slice` AND no `blocked`/`needs-decision`/`ci-failed`. Add the label, or check what's blocking.
+Filter requires label `slice` AND none of `blocked` / `needs-decision` / `ci-failed`. Check labels.

 **An issue keeps getting `needs-decision`.**
-Its acceptance criteria probably aren't testable at the outermost layer. Open the issue, rewrite criteria so they're observable (e.g. "POST /x returns 201 and row appears on dashboard" not "feature works"), remove the label, the loop will pick it up next run.
+Acceptance criteria probably aren't testable at the outermost layer. Rewrite as observable (e.g. "POST /x returns 201 and row appears on dashboard"), drop the label, rerun.

-**Three failures in a row, loop stopped.**
-Something systemic — bad dependency, branch protection blocking, flaky test env. Check `.ship-it-logs/iter-*.log` for the per-issue detail.
+**`/prototype` keeps wanting to "tidy up" the spike before reporting.**
+That's a sign the assumption isn't sharp enough — Claude is filling time. Sharpen the assumption and rerun, or just say "stop, report what you have now."

-**Issue had a PR opened but didn't merge.**
-Branch protection requires human review. The loop reports it as "shipped → open for review" and moves on. Review and merge when you can.
+**`/research` returns shallow results.**
+The decomposed questions were too broad. Ask it to redo with a tighter scope, or constrain ("only this repo" / "only Node-RED + InfluxDB stack").
+
+**`/prd-to-issues` drafts look like layer cake.**
+Stop, say "reslice — these are horizontal." The skill's self-audit should catch this, but if it doesn't, push back explicitly.

 ---

 ## Design principles

- **Vertical slices, always.** No "implement the backend first, then the frontend". Every issue exercises every layer.
- **Gaps are explicit, never hidden.** Hedged answers in grilling → Open Questions in PRD → `needs-decision` issues, not silent assumptions.
- **AFK only after the contract is locked.** Autonomous code only ships against decisions that already exist on paper.
+- **Front-load gap discovery.** Research, prototype, grill-me, PRD — each phase exists to surface gaps before they cost real implementation time.
+- **Gaps are explicit, never hidden.** Open Unknowns in `/research` → spike claims in `/prototype` → Open Questions in PRD → `needs-decision` labels on issues. Nothing gets papered over.
+- **Vertical slices, always.** No "implement the backend first". Every slice exercises every layer.
+- **AFK only after the contract is locked.** Autonomous code only runs against decisions already on paper.
+- **Throwaway means throwaway.** Prototypes are evidence; the real implementation in production code happens fresh in `/ship-it`.
+- **Outermost-layer verification.** "Tests pass" isn't enough — the loop confirms user-observable behavior actually works before reporting shipped.
 - **One commit per slice.** Small, reviewable, revertible.
- **Outermost-layer verification.** "Tests pass" isn't enough — the loop confirms the user-observable behavior actually works before reporting shipped.