Files
EVOLV/.claude/skills/README.md
znetsixe fcaad8cd9f chore(skills): add /research and /prototype; rewrite README for 6-skill chain
Front-loads gap discovery before /grill-me by adding two skills:

  research    MOSTLY  fans out Explore + WebSearch agents in parallel,
                      synthesizes findings into a brief, names open
                      unknowns explicitly (which become /prototype targets)

  prototype   MOSTLY  builds a throwaway spike to test ONE falsifiable
                      assumption; code lives in .prototypes/ (gitignored),
                      never promoted; output is evidence — verdict, numbers,
                      observed behavior — that feeds /prd

Full chain now:
  /research → /prototype → /grill-me → /prd → /prd-to-issues → /ship-it

Chain rationale: /research and /prototype surface knowledge gaps and falsify
risky assumptions while the cost of changing direction is still cheap; the
TOGETHER phases (grill-me, prd) lock down the contract; the AFK phase
(ship-it) only executes against contracts already on paper.

The chain is a default, not a mandate — README covers when to skip
upstream skills for small or stack-familiar work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 16:33:26 +02:00

218 lines
11 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Workflow skills — research → prototype → grill-me → prd → prd-to-issues → ship-it
A six-skill chain that takes a vague idea from "I wonder if we could…" to merged, end-to-end-verified code. Four collaborative phases up front to lock down what's being built; two largely-autonomous phases that execute against that contract.
```
/research <topic> MOSTLY external + repo knowledge into a brief
/prototype <claim> MOSTLY throwaway spike to test the riskiest assumption
/grill-me <topic> TOGETHER pressure-test what survived
/prd TOGETHER synthesize PRD; gaps stay explicit
/prd-to-issues MOSTLY thin vertical-slice issues; file on "create"
/ship-it AFK shell loop ships every slice end-to-end
```
You don't have to use every skill on every feature. Small tweaks may skip `/research` and `/prototype`. Bigger / novel work uses the whole chain.
## Mode taxonomy
| Mode | Meaning | Skills |
|---|---|---|
| **TOGETHER** | Needs your turn-by-turn judgment. No autonomous path. | grill-me |
| **MOSTLY TOGETHER** | Drafts / fetches / builds AFK. You review the output. Any visible-to-team action needs your explicit "go". | research, prototype, prd, prd-to-issues |
| **AFK** | No human in the loop. Logs questions to issues instead of asking. | ship-it |
The chain is structured so AFK execution only starts after the human-locked phases have nailed down the contract. Autonomous code never runs against undefined contracts.
---
## When to use each
### `/research <topic>` — MOSTLY TOGETHER
Fans out Explore + WebSearch agents in parallel, synthesizes findings into a research brief, and names open unknowns explicitly (which become candidates for `/prototype`).
**Use when:** the topic touches anything you haven't done before in this codebase. Novel libraries, unfamiliar patterns, "how do others solve this".
**Don't use it for:** stuff you already understand. The point is to fetch what you don't know, not summarize what you do.
### `/prototype <claim>` — MOSTLY TOGETHER
Builds a throwaway spike to test ONE falsifiable assumption. Code lives in `.prototypes/` (gitignored) and is never promoted to the main codebase. Output is evidence — verdict, numbers, observed behavior — that feeds the PRD.
**Use when:** `/research` surfaced an Open Unknown that "we'll find out when we build it". Better to find out for an hour of spike cost than a week of half-built feature.
**Don't use it for:** building a "lightweight v0" you secretly plan to evolve. Prototypes are evidence; production code is the real implementation. The skill rejects scope creep mid-spike.
### `/grill-me <topic>` — TOGETHER
Senior staff engineer running a brutal-but-fair interview. One hard question at a time, honest critique (no praise filler), drills into weak spots. Stay on topic until exhausted; say `stop` for an honest 3-bullet debrief.
**Use when:** `/research` and `/prototype` (if used) have built up enough context that you need to pressure-test your own thinking before locking it in a PRD.
**Don't use it for:** rubber-stamping a finished idea, or when you want validation. Designed to find gaps, not to agree.
### `/prd` — TOGETHER (drafts AFK after grilling)
Engineering PRD: Problem, Goals, Non-goals, Users & scenarios, Functional + Non-functional requirements, Constraints, Success metrics, Open Questions, Out of scope. Things you nailed in grilling become firm requirements. Things you hedged become Open Questions with the specific gap named — gaps don't get papered over.
**Use when:** the grilling exposed enough that the feature shape is clear. Or standalone when you already have full context.
**Don't use it for:** strategy decks, market sizing, "why now". This is for engineering.
### `/prd-to-issues` — MOSTLY TOGETHER
Breaks the PRD into **thin vertical slices** — each issue cuts end-to-end through every integration layer (schema → service → API → UI → tests; or sensor → broker → parser → store → dashboard). First slice is a walking skeleton. Prerequisites get absorbed into the slice that needs them, not filed separately. Per-issue `Slice check` block proves every layer is covered, plus a coverage matrix at the top of the draft showing PRD → issue mapping. Self-audit runs **before** the draft is shown to you.
**Output:** draft inline → you reply `create` → files to the tracker (`gh` for GitHub, `tea` for Gitea).
**Don't use it for:** horizontal task lists ("DB work", "API work", "frontend work"). The skill rejects layer-cake slicing.
### `/ship-it` — AFK
Shell loop in `.claude/skills/ship-it/loop.sh`. Picks the next ready issue, dispatches a fresh headless Claude to ship it end-to-end (failing e2e test first → implement layer by layer → full suite → outermost-layer smoke check → commit `Closes #N` → PR with acceptance-criteria checkboxes + smoke evidence → CI gate → merge or leave-for-review), then moves on. One commit per issue. Status streams to terminal; tail logs from another shell; Ctrl-C anytime.
Undecidable issues get labeled `needs-decision` and skipped. Three consecutive failures stops the loop for human review.
**Don't use it for:** issues whose acceptance criteria aren't testable. The loop will skip them.
---
## A worked example
Adding live sensor display for a new flow meter to the operator dashboard.
```bash
# 1. Fetch what we don't already know
/research adding live flow-meter readings to the operator dashboard
# Brief lands; surfaces an Open Unknown:
# "Can Node-RED sustain 1Hz updates across 12 dashboard panels for 10 min
# straight without dropping frames?"
# 2. Test the risky assumption
/prototype Node-RED can stream 1Hz updates to 12 Grafana panels for 10 min straight
# Spike runs in .prototypes/nodered-throughput/;
# Verdict: confirmed, 14% CPU peak. Evidence captured. Prototype stays gitignored.
# 3. Pressure-test the design
/grill-me adding live flow-meter readings to the operator dashboard
# 68 hard questions; surfaces gaps in alerting and missing-data handling.
# 4. Lock down the contract
/prd
# PRD drafts with the alerting decision as a firm requirement and the
# missing-data behavior as an explicit Open Question.
# 5. Slice it
/prd-to-issues
# 5 slices; coverage matrix confirms every PRD requirement maps to a slice.
# Reply `create` → issues #142..#146 filed.
# 6. Walk away
/ship-it
# Preflight, plan, "Start? Reply `go`." → `go` → shell loop runs.
# Another terminal: tail -f .ship-it-logs/run-*.log
```
After ship-it exits, the summary tells you what shipped, what's open for review, what hit `needs-decision`.
## Skipping skills
The chain is a default, not a mandate:
- **Tiny well-understood change:** straight to `/prd-to-issues` (or file the issue by hand and run `/ship-it`).
- **Bigger but stack-familiar:** skip `/research` and `/prototype`; start at `/grill-me`.
- **Pure research, no implementation yet:** stop after `/research` or `/prototype` — the brief or findings are the deliverable.
- **Existing PRD from somewhere else:** `/prd-to-issues <path>` and go.
---
## File layout
```
.claude/skills/
├── README.md ← this file
├── research/
│ └── SKILL.md
├── prototype/
│ └── SKILL.md
├── grill-me/
│ └── SKILL.md
├── prd/
│ └── SKILL.md
├── prd-to-issues/
│ └── SKILL.md
└── ship-it/
├── SKILL.md ← entry point; chat-side bootstrap
├── loop.sh ← orchestrator (the actual loop)
└── iterate.md ← per-issue prompt the loop dispatches
.prototypes/ ← throwaway spike code (gitignored, created by /prototype)
.ship-it-logs/ ← ship-it loop logs (recommend gitignoring)
docs/
├── research/ ← saved research briefs ("save it" in /research)
└── prd/ ← saved PRDs ("save it" in /prd)
```
---
## Configuration
### ship-it tracker support
- **GitHub** — `gh` CLI required (`gh auth status`)
- **Gitea** — `tea` CLI required (`go install code.gitea.io/tea@latest && tea login add`)
- Auto-detected from `git remote get-url origin`
### ship-it env vars
| Var | Default | Purpose |
|---|---|---|
| `SHIP_IT_TRUNK` | `main` | Trunk branch (set to `development` for the EVOLV repo) |
| `SHIP_IT_MAX` | 50 | Iteration cap |
| `SHIP_IT_MAX_FAIL` | 3 | Consecutive failures before stop |
| `SHIP_IT_TIMEOUT` | 30m | Per-issue timeout |
| `SHIP_IT_LOG_DIR` | `<repo>/.ship-it-logs` | Log directory |
Example for EVOLV:
```bash
SHIP_IT_TRUNK=development bash .claude/skills/ship-it/loop.sh
```
### Issue label expected by ship-it
The loop filters to open issues with label `slice` and without `blocked`, `needs-decision`, or `ci-failed`. `/prd-to-issues` applies `slice` by default. If you file issues by hand, add the label or ship-it won't pick them up.
---
## Troubleshooting
**`ship-it` won't start: "tea CLI not installed".**
Repo remote is Gitea but you don't have `tea`. Install it (`go install code.gitea.io/tea@latest && tea login add`) or push to a GitHub mirror.
**`ship-it` exits immediately: "git tree is dirty".**
Commit or stash before running. The loop won't risk mixing WIP into a slice.
**`ship-it` says "backlog empty" but I have open issues.**
Filter requires label `slice` AND none of `blocked` / `needs-decision` / `ci-failed`. Check labels.
**An issue keeps getting `needs-decision`.**
Acceptance criteria probably aren't testable at the outermost layer. Rewrite as observable (e.g. "POST /x returns 201 and row appears on dashboard"), drop the label, rerun.
**`/prototype` keeps wanting to "tidy up" the spike before reporting.**
That's a sign the assumption isn't sharp enough — Claude is filling time. Sharpen the assumption and rerun, or just say "stop, report what you have now."
**`/research` returns shallow results.**
The decomposed questions were too broad. Ask it to redo with a tighter scope, or constrain ("only this repo" / "only Node-RED + InfluxDB stack").
**`/prd-to-issues` drafts look like layer cake.**
Stop, say "reslice — these are horizontal." The skill's self-audit should catch this, but if it doesn't, push back explicitly.
---
## Design principles
- **Front-load gap discovery.** Research, prototype, grill-me, PRD — each phase exists to surface gaps before they cost real implementation time.
- **Gaps are explicit, never hidden.** Open Unknowns in `/research` → spike claims in `/prototype` → Open Questions in PRD → `needs-decision` labels on issues. Nothing gets papered over.
- **Vertical slices, always.** No "implement the backend first". Every slice exercises every layer.
- **AFK only after the contract is locked.** Autonomous code only runs against decisions already on paper.
- **Throwaway means throwaway.** Prototypes are evidence; the real implementation in production code happens fresh in `/ship-it`.
- **Outermost-layer verification.** "Tests pass" isn't enough — the loop confirms user-observable behavior actually works before reporting shipped.
- **One commit per slice.** Small, reviewable, revertible.