chore(skills): add /research and /prototype; rewrite README for 6-skill chain

Front-loads gap discovery before /grill-me by adding two skills:

  research    MOSTLY  fans out Explore + WebSearch agents in parallel,
                      synthesizes findings into a brief, names open
                      unknowns explicitly (which become /prototype targets)

  prototype   MOSTLY  builds a throwaway spike to test ONE falsifiable
                      assumption; code lives in .prototypes/ (gitignored),
                      never promoted; output is evidence — verdict, numbers,
                      observed behavior — that feeds /prd

Full chain now:
  /research → /prototype → /grill-me → /prd → /prd-to-issues → /ship-it

Chain rationale: /research and /prototype surface knowledge gaps and falsify
risky assumptions while the cost of changing direction is still cheap; the
TOGETHER phases (grill-me, prd) lock down the contract; the AFK phase
(ship-it) only executes against contracts already on paper.

The chain is a default, not a mandate — README covers when to skip
upstream skills for small or stack-familiar work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
znetsixe
2026-05-21 16:33:26 +02:00
parent 6ff262e96e
commit fcaad8cd9f
3 changed files with 265 additions and 93 deletions

View File

@@ -1,132 +1,127 @@
# Workflow skills — grill-me → prd → prd-to-issues → ship-it
# Workflow skills — research → prototype → grill-me → prd → prd-to-issues → ship-it
A four-skill chain that takes a vague feature idea from "I want to build X" to merged, end-to-end-verified code. Two phases are collaborative (you + Claude), two run autonomously (AFK).
A six-skill chain that takes a vague idea from "I wonder if we could…" to merged, end-to-end-verified code. Four collaborative phases up front to lock down what's being built; two largely-autonomous phases that execute against that contract.
```
/grill-me <topic> TOGETHER pressure-test the idea
/research <topic> MOSTLY external + repo knowledge into a brief
/prd TOGETHER synthesize a PRD; gaps stay explicit
/prototype <claim> MOSTLY throwaway spike to test the riskiest assumption
/prd-to-issues MOSTLY thin vertical-slice issues; file when you say "create"
/grill-me <topic> TOGETHER pressure-test what survived
/ship-it AFK shell loop ships every slice end-to-end
/prd TOGETHER synthesize PRD; gaps stay explicit
/prd-to-issues MOSTLY thin vertical-slice issues; file on "create"
/ship-it AFK shell loop ships every slice end-to-end
```
## The TOGETHER vs AFK distinction
You don't have to use every skill on every feature. Small tweaks may skip `/research` and `/prototype`. Bigger / novel work uses the whole chain.
## Mode taxonomy
| Mode | Meaning | Skills |
|---|---|---|
| **TOGETHER** | Needs your judgment every turn. No autonomous path. | `grill-me` |
| **MOSTLY TOGETHER** | Drafts/audits AFK, but a visible-to-team action needs your "go". | `prd`, `prd-to-issues` |
| **AFK** | No human in the loop. Logs questions to issues instead of asking. | `ship-it` |
| **TOGETHER** | Needs your turn-by-turn judgment. No autonomous path. | grill-me |
| **MOSTLY TOGETHER** | Drafts / fetches / builds AFK. You review the output. Any visible-to-team action needs your explicit "go". | research, prototype, prd, prd-to-issues |
| **AFK** | No human in the loop. Logs questions to issues instead of asking. | ship-it |
Each skill's `SKILL.md` declares its mode in the first body line. The chain is designed so the AFK phase only starts after the human-locked phases have nailed down the contract (PRD requirements, slice acceptance criteria) — AFK code only ships against decisions that already exist on paper.
The chain is structured so AFK execution only starts after the human-locked phases have nailed down the contract. Autonomous code never runs against undefined contracts.
---
## When to use each
### `/research <topic>` — MOSTLY TOGETHER
Fans out Explore + WebSearch agents in parallel, synthesizes findings into a research brief, and names open unknowns explicitly (which become candidates for `/prototype`).
**Use when:** the topic touches anything you haven't done before in this codebase. Novel libraries, unfamiliar patterns, "how do others solve this".
**Don't use it for:** stuff you already understand. The point is to fetch what you don't know, not summarize what you do.
### `/prototype <claim>` — MOSTLY TOGETHER
Builds a throwaway spike to test ONE falsifiable assumption. Code lives in `.prototypes/` (gitignored) and is never promoted to the main codebase. Output is evidence — verdict, numbers, observed behavior — that feeds the PRD.
**Use when:** `/research` surfaced an Open Unknown that "we'll find out when we build it". Better to find out for an hour of spike cost than a week of half-built feature.
**Don't use it for:** building a "lightweight v0" you secretly plan to evolve. Prototypes are evidence; production code is the real implementation. The skill rejects scope creep mid-spike.
### `/grill-me <topic>` — TOGETHER
Senior staff engineer running a brutal-but-fair interview. One hard question at a time, honest critique (no praise filler), drills into weak spots. Stay on topic until exhausted; say `stop` for an honest 3-bullet debrief.
**Use when:** you have a fuzzy idea and want it pressure-tested before committing to scope. Also useful for interview prep on any technical topic.
**Use when:** `/research` and `/prototype` (if used) have built up enough context that you need to pressure-test your own thinking before locking it in a PRD.
**Behavior:** acts as a senior staff engineer running a brutal-but-fair interview. One hard question at a time, honest critique (no praise filler), drills into weak spots, demands specifics over buzzwords. Stay on the topic until exhausted; say `stop` for an honest 3-bullet debrief.
**Don't use it for:** rubber-stamping a finished idea, or when you want validation. It's designed to find gaps, not to agree with you.
**Don't use it for:** rubber-stamping a finished idea, or when you want validation. Designed to find gaps, not to agree.
### `/prd` — TOGETHER (drafts AFK after grilling)
Engineering PRD: Problem, Goals, Non-goals, Users & scenarios, Functional + Non-functional requirements, Constraints, Success metrics, Open Questions, Out of scope. Things you nailed in grilling become firm requirements. Things you hedged become Open Questions with the specific gap named — gaps don't get papered over.
**Use when:** the grilling exposed enough that you're ready to lock down what's being built. Or standalone when you already know the feature shape.
**Use when:** the grilling exposed enough that the feature shape is clear. Or standalone when you already have full context.
**Behavior:** writes an engineering PRD (Problem, Goals, Non-goals, Users & scenarios, Functional + Non-functional requirements, Constraints, Success metrics, Open Questions, Out of scope). Things you nailed in the grilling become firm requirements. Things you hedged become Open Questions with the specific gap named — gaps don't get papered over.
**Output:** inline by default. Say "save it" → writes to `docs/prd/<name>.md`.
**Don't use it for:** strategy docs, market sizing, or "why now" sections. This is for engineering.
**Don't use it for:** strategy decks, market sizing, "why now". This is for engineering.
### `/prd-to-issues` — MOSTLY TOGETHER
Breaks the PRD into **thin vertical slices** — each issue cuts end-to-end through every integration layer (schema → service → API → UI → tests; or sensor → broker → parser → store → dashboard). First slice is a walking skeleton. Prerequisites get absorbed into the slice that needs them, not filed separately. Per-issue `Slice check` block proves every layer is covered, plus a coverage matrix at the top of the draft showing PRD → issue mapping. Self-audit runs **before** the draft is shown to you.
**Use when:** you have a PRD (just-drafted or path-pointed) and need a backlog of issues a team can pick up.
**Output:** draft inline → you reply `create` → files to the tracker (`gh` for GitHub, `tea` for Gitea).
**Behavior:** breaks the PRD into **thin vertical slices** — each issue cuts end-to-end through every integration layer (schema → service → API → UI → tests; or sensor → broker → parser → store → dashboard). The first slice is a walking skeleton. Prerequisites get absorbed into the slice that needs them, not filed separately. Each issue has a visible `Slice check` block proving every layer is covered, plus a coverage matrix at the top of the draft showing PRD → issue mapping. Self-audit runs **before** drafting is shown to you.
**Output:** draft inline → you reply `create` → it files to the tracker (`gh` for GitHub, `tea` for Gitea).
**Don't use it for:** horizontal task lists ("DB work", "API work", "frontend work"). The skill rejects layer-cake slicing — if your work needs that, you want a different tool.
**Don't use it for:** horizontal task lists ("DB work", "API work", "frontend work"). The skill rejects layer-cake slicing.
### `/ship-it` — AFK
**Use when:** issues are filed, you want to walk away, you'll review the PRs later.
**Behavior:** runs a shell loop that picks the next ready issue, dispatches a fresh headless Claude to ship it end-to-end (failing e2e test first → implement layer by layer → full suite → outermost-layer smoke check → commit `Closes #N` → PR with checked acceptance criteria + smoke evidence → CI gate → merge or leave-for-review), then moves on. One commit per issue. Status streams to the terminal; you can `tail -f` the log from another shell or Ctrl-C anytime.
Shell loop in `.claude/skills/ship-it/loop.sh`. Picks the next ready issue, dispatches a fresh headless Claude to ship it end-to-end (failing e2e test first → implement layer by layer → full suite → outermost-layer smoke check → commit `Closes #N` → PR with acceptance-criteria checkboxes + smoke evidence → CI gate → merge or leave-for-review), then moves on. One commit per issue. Status streams to terminal; tail logs from another shell; Ctrl-C anytime.
Undecidable issues get labeled `needs-decision` and skipped. Three consecutive failures stops the loop for human review.
**Don't use it for:** issues whose acceptance criteria aren't testable (the loop will skip them), or for slices that need real-world side effects with no test harness.
**Don't use it for:** issues whose acceptance criteria aren't testable. The loop will skip them.
---
## A worked example
You want to add live sensor display for a new flow meter on a dashboard.
Adding live sensor display for a new flow meter to the operator dashboard.
```bash
# 1. Pressure-test the idea
# 1. Fetch what we don't already know
/research adding live flow-meter readings to the operator dashboard
# Brief lands; surfaces an Open Unknown:
# "Can Node-RED sustain 1Hz updates across 12 dashboard panels for 10 min
# straight without dropping frames?"
# 2. Test the risky assumption
/prototype Node-RED can stream 1Hz updates to 12 Grafana panels for 10 min straight
# Spike runs in .prototypes/nodered-throughput/;
# Verdict: confirmed, 14% CPU peak. Evidence captured. Prototype stays gitignored.
# 3. Pressure-test the design
/grill-me adding live flow-meter readings to the operator dashboard
# 68 hard questions; surfaces gaps in alerting and missing-data handling.
# (you answer 6-8 hard questions; gaps surface)
# 2. Lock down the contract
# 4. Lock down the contract
/prd
# PRD drafts with the alerting decision as a firm requirement and the
# missing-data behavior as an explicit Open Question.
# (PRD drafts; you edit one section; save it)
# 3. Slice it
# 5. Slice it
/prd-to-issues
# 5 slices; coverage matrix confirms every PRD requirement maps to a slice.
# Reply `create` → issues #142..#146 filed.
# (draft shows ~5 slices, each with a Slice check block. Coverage matrix
# confirms every PRD requirement maps to a slice. Reply `create`.)
# → issues #142..#146 filed
# 4. Walk away
# 6. Walk away
/ship-it
# (preflight, plan, "Start? Reply `go`." → `go` → shell loop runs)
# In another terminal: tail -f .ship-it-logs/run-*.log
# Preflight, plan, "Start? Reply `go`." → `go` → shell loop runs.
# Another terminal: tail -f .ship-it-logs/run-*.log
```
After the loop exits, the summary tells you what shipped, what's open for review, what hit `needs-decision`.
After ship-it exits, the summary tells you what shipped, what's open for review, what hit `needs-decision`.
---
## Skipping skills
## Configuration
The chain is a default, not a mandate:
### Repo trunk branch
`ship-it` defaults to `main`. If your repo uses something else:
```bash
SHIP_IT_TRUNK=development bash .claude/skills/ship-it/loop.sh
```
### Other env vars
| Var | Default | Purpose |
|---|---|---|
| `SHIP_IT_MAX` | 50 | Iteration cap |
| `SHIP_IT_MAX_FAIL` | 3 | Consecutive failures before stop |
| `SHIP_IT_TIMEOUT` | 30m | Per-issue timeout |
| `SHIP_IT_LOG_DIR` | `<repo>/.ship-it-logs` | Log directory |
### Tracker support
- **GitHub** — uses `gh` CLI (must be installed and authenticated: `gh auth status`).
- **Gitea** — uses `tea` CLI (must be installed: `go install code.gitea.io/tea@latest`, then `tea login add`).
- Auto-detected from `git remote get-url origin`.
### Issue label expected by `ship-it`
The loop filters to open issues with label `slice` and without labels `blocked`, `needs-decision`, or `ci-failed`. `/prd-to-issues` applies the `slice` label by default. If you file issues by hand, add the label or `ship-it` won't pick them up.
- **Tiny well-understood change:** straight to `/prd-to-issues` (or file the issue by hand and run `/ship-it`).
- **Bigger but stack-familiar:** skip `/research` and `/prototype`; start at `/grill-me`.
- **Pure research, no implementation yet:** stop after `/research` or `/prototype` — the brief or findings are the deliverable.
- **Existing PRD from somewhere else:** `/prd-to-issues <path>` and go.
---
@@ -134,7 +129,11 @@ The loop filters to open issues with label `slice` and without labels `blocked`,
```
.claude/skills/
├── README.md ← this file
├── README.md ← this file
├── research/
│ └── SKILL.md
├── prototype/
│ └── SKILL.md
├── grill-me/
│ └── SKILL.md
├── prd/
@@ -142,39 +141,77 @@ The loop filters to open issues with label `slice` and without labels `blocked`,
├── prd-to-issues/
│ └── SKILL.md
└── ship-it/
├── SKILL.md ← entry point; chat-side bootstrap
├── loop.sh ← orchestrator (the actual loop)
└── iterate.md ← per-issue prompt the loop dispatches
├── SKILL.md ← entry point; chat-side bootstrap
├── loop.sh ← orchestrator (the actual loop)
└── iterate.md ← per-issue prompt the loop dispatches
.prototypes/ ← throwaway spike code (gitignored, created by /prototype)
.ship-it-logs/ ← ship-it loop logs (recommend gitignoring)
docs/
├── research/ ← saved research briefs ("save it" in /research)
└── prd/ ← saved PRDs ("save it" in /prd)
```
---
## Configuration
### ship-it tracker support
- **GitHub** — `gh` CLI required (`gh auth status`)
- **Gitea** — `tea` CLI required (`go install code.gitea.io/tea@latest && tea login add`)
- Auto-detected from `git remote get-url origin`
### ship-it env vars
| Var | Default | Purpose |
|---|---|---|
| `SHIP_IT_TRUNK` | `main` | Trunk branch (set to `development` for the EVOLV repo) |
| `SHIP_IT_MAX` | 50 | Iteration cap |
| `SHIP_IT_MAX_FAIL` | 3 | Consecutive failures before stop |
| `SHIP_IT_TIMEOUT` | 30m | Per-issue timeout |
| `SHIP_IT_LOG_DIR` | `<repo>/.ship-it-logs` | Log directory |
Example for EVOLV:
```bash
SHIP_IT_TRUNK=development bash .claude/skills/ship-it/loop.sh
```
### Issue label expected by ship-it
The loop filters to open issues with label `slice` and without `blocked`, `needs-decision`, or `ci-failed`. `/prd-to-issues` applies `slice` by default. If you file issues by hand, add the label or ship-it won't pick them up.
---
## Troubleshooting
**`ship-it` won't start: "tea CLI not installed".**
The repo's remote is Gitea but you don't have `tea`. Install it (`go install code.gitea.io/tea@latest && tea login add`) or push to a GitHub mirror.
Repo remote is Gitea but you don't have `tea`. Install it (`go install code.gitea.io/tea@latest && tea login add`) or push to a GitHub mirror.
**`ship-it` exits immediately: "git tree is dirty".**
Commit or stash everything before running. The loop won't risk mixing your work-in-progress into a slice.
Commit or stash before running. The loop won't risk mixing WIP into a slice.
**`ship-it` says "backlog empty" but I have open issues.**
The filter requires label `slice` AND no `blocked`/`needs-decision`/`ci-failed`. Add the label, or check what's blocking.
Filter requires label `slice` AND none of `blocked` / `needs-decision` / `ci-failed`. Check labels.
**An issue keeps getting `needs-decision`.**
Its acceptance criteria probably aren't testable at the outermost layer. Open the issue, rewrite criteria so they're observable (e.g. "POST /x returns 201 and row appears on dashboard" not "feature works"), remove the label, the loop will pick it up next run.
Acceptance criteria probably aren't testable at the outermost layer. Rewrite as observable (e.g. "POST /x returns 201 and row appears on dashboard"), drop the label, rerun.
**Three failures in a row, loop stopped.**
Something systemic — bad dependency, branch protection blocking, flaky test env. Check `.ship-it-logs/iter-*.log` for the per-issue detail.
**`/prototype` keeps wanting to "tidy up" the spike before reporting.**
That's a sign the assumption isn't sharp enough — Claude is filling time. Sharpen the assumption and rerun, or just say "stop, report what you have now."
**Issue had a PR opened but didn't merge.**
Branch protection requires human review. The loop reports it as "shipped → open for review" and moves on. Review and merge when you can.
**`/research` returns shallow results.**
The decomposed questions were too broad. Ask it to redo with a tighter scope, or constrain ("only this repo" / "only Node-RED + InfluxDB stack").
**`/prd-to-issues` drafts look like layer cake.**
Stop, say "reslice — these are horizontal." The skill's self-audit should catch this, but if it doesn't, push back explicitly.
---
## Design principles
- **Vertical slices, always.** No "implement the backend first, then the frontend". Every issue exercises every layer.
- **Gaps are explicit, never hidden.** Hedged answers in grilling → Open Questions in PRD → `needs-decision` issues, not silent assumptions.
- **AFK only after the contract is locked.** Autonomous code only ships against decisions that already exist on paper.
- **Front-load gap discovery.** Research, prototype, grill-me, PRD — each phase exists to surface gaps before they cost real implementation time.
- **Gaps are explicit, never hidden.** Open Unknowns in `/research` → spike claims in `/prototype` → Open Questions in PRD → `needs-decision` labels on issues. Nothing gets papered over.
- **Vertical slices, always.** No "implement the backend first". Every slice exercises every layer.
- **AFK only after the contract is locked.** Autonomous code only runs against decisions already on paper.
- **Throwaway means throwaway.** Prototypes are evidence; the real implementation in production code happens fresh in `/ship-it`.
- **Outermost-layer verification.** "Tests pass" isn't enough — the loop confirms user-observable behavior actually works before reporting shipped.
- **One commit per slice.** Small, reviewable, revertible.
- **Outermost-layer verification.** "Tests pass" isn't enough — the loop confirms the user-observable behavior actually works before reporting shipped.