Front-loads gap discovery before /grill-me by adding two skills:
research MOSTLY fans out Explore + WebSearch agents in parallel,
synthesizes findings into a brief, names open
unknowns explicitly (which become /prototype targets)
prototype MOSTLY builds a throwaway spike to test ONE falsifiable
assumption; code lives in .prototypes/ (gitignored),
never promoted; output is evidence — verdict, numbers,
observed behavior — that feeds /prd
Full chain now:
/research → /prototype → /grill-me → /prd → /prd-to-issues → /ship-it
Chain rationale: /research and /prototype surface knowledge gaps and falsify
risky assumptions while the cost of changing direction is still cheap; the
TOGETHER phases (grill-me, prd) lock down the contract; the AFK phase
(ship-it) only executes against contracts already on paper.
The chain is a default, not a mandate — README covers when to skip
upstream skills for small or stack-familiar work.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Workflow skills — research → prototype → grill-me → prd → prd-to-issues → ship-it
A six-skill chain that takes a vague idea from "I wonder if we could…" to merged, end-to-end-verified code. Four collaborative phases up front to lock down what's being built; two largely-autonomous phases that execute against that contract.
/research <topic> MOSTLY external + repo knowledge into a brief
↓
/prototype <claim> MOSTLY throwaway spike to test the riskiest assumption
↓
/grill-me <topic> TOGETHER pressure-test what survived
↓
/prd TOGETHER synthesize PRD; gaps stay explicit
↓
/prd-to-issues MOSTLY thin vertical-slice issues; file on "create"
↓
/ship-it AFK shell loop ships every slice end-to-end
You don't have to use every skill on every feature. Small tweaks may skip /research and /prototype. Bigger / novel work uses the whole chain.
Mode taxonomy
| Mode | Meaning | Skills |
|---|---|---|
| TOGETHER | Needs your turn-by-turn judgment. No autonomous path. | grill-me |
| MOSTLY TOGETHER | Drafts / fetches / builds AFK. You review the output. Any visible-to-team action needs your explicit "go". | research, prototype, prd, prd-to-issues |
| AFK | No human in the loop. Logs questions to issues instead of asking. | ship-it |
The chain is structured so AFK execution only starts after the human-locked phases have nailed down the contract. Autonomous code never runs against undefined contracts.
When to use each
/research <topic> — MOSTLY TOGETHER
Fans out Explore + WebSearch agents in parallel, synthesizes findings into a research brief, and names open unknowns explicitly (which become candidates for /prototype).
Use when: the topic touches anything you haven't done before in this codebase. Novel libraries, unfamiliar patterns, "how do others solve this".
Don't use it for: stuff you already understand. The point is to fetch what you don't know, not summarize what you do.
/prototype <claim> — MOSTLY TOGETHER
Builds a throwaway spike to test ONE falsifiable assumption. Code lives in .prototypes/ (gitignored) and is never promoted to the main codebase. Output is evidence — verdict, numbers, observed behavior — that feeds the PRD.
Use when: /research surfaced an Open Unknown that "we'll find out when we build it". Better to find out for an hour of spike cost than a week of half-built feature.
Don't use it for: building a "lightweight v0" you secretly plan to evolve. Prototypes are evidence; production code is the real implementation. The skill rejects scope creep mid-spike.
/grill-me <topic> — TOGETHER
Senior staff engineer running a brutal-but-fair interview. One hard question at a time, honest critique (no praise filler), drills into weak spots. Stay on topic until exhausted; say stop for an honest 3-bullet debrief.
Use when: /research and /prototype (if used) have built up enough context that you need to pressure-test your own thinking before locking it in a PRD.
Don't use it for: rubber-stamping a finished idea, or when you want validation. Designed to find gaps, not to agree.
/prd — TOGETHER (drafts AFK after grilling)
Engineering PRD: Problem, Goals, Non-goals, Users & scenarios, Functional + Non-functional requirements, Constraints, Success metrics, Open Questions, Out of scope. Things you nailed in grilling become firm requirements. Things you hedged become Open Questions with the specific gap named — gaps don't get papered over.
Use when: the grilling exposed enough that the feature shape is clear. Or standalone when you already have full context.
Don't use it for: strategy decks, market sizing, "why now". This is for engineering.
/prd-to-issues — MOSTLY TOGETHER
Breaks the PRD into thin vertical slices — each issue cuts end-to-end through every integration layer (schema → service → API → UI → tests; or sensor → broker → parser → store → dashboard). First slice is a walking skeleton. Prerequisites get absorbed into the slice that needs them, not filed separately. Per-issue Slice check block proves every layer is covered, plus a coverage matrix at the top of the draft showing PRD → issue mapping. Self-audit runs before the draft is shown to you.
Output: draft inline → you reply create → files to the tracker (gh for GitHub, tea for Gitea).
Don't use it for: horizontal task lists ("DB work", "API work", "frontend work"). The skill rejects layer-cake slicing.
/ship-it — AFK
Shell loop in .claude/skills/ship-it/loop.sh. Picks the next ready issue, dispatches a fresh headless Claude to ship it end-to-end (failing e2e test first → implement layer by layer → full suite → outermost-layer smoke check → commit Closes #N → PR with acceptance-criteria checkboxes + smoke evidence → CI gate → merge or leave-for-review), then moves on. One commit per issue. Status streams to terminal; tail logs from another shell; Ctrl-C anytime.
Undecidable issues get labeled needs-decision and skipped. Three consecutive failures stops the loop for human review.
Don't use it for: issues whose acceptance criteria aren't testable. The loop will skip them.
A worked example
Adding live sensor display for a new flow meter to the operator dashboard.
# 1. Fetch what we don't already know
/research adding live flow-meter readings to the operator dashboard
# Brief lands; surfaces an Open Unknown:
# "Can Node-RED sustain 1Hz updates across 12 dashboard panels for 10 min
# straight without dropping frames?"
# 2. Test the risky assumption
/prototype Node-RED can stream 1Hz updates to 12 Grafana panels for 10 min straight
# Spike runs in .prototypes/nodered-throughput/;
# Verdict: confirmed, 14% CPU peak. Evidence captured. Prototype stays gitignored.
# 3. Pressure-test the design
/grill-me adding live flow-meter readings to the operator dashboard
# 6–8 hard questions; surfaces gaps in alerting and missing-data handling.
# 4. Lock down the contract
/prd
# PRD drafts with the alerting decision as a firm requirement and the
# missing-data behavior as an explicit Open Question.
# 5. Slice it
/prd-to-issues
# 5 slices; coverage matrix confirms every PRD requirement maps to a slice.
# Reply `create` → issues #142..#146 filed.
# 6. Walk away
/ship-it
# Preflight, plan, "Start? Reply `go`." → `go` → shell loop runs.
# Another terminal: tail -f .ship-it-logs/run-*.log
After ship-it exits, the summary tells you what shipped, what's open for review, what hit needs-decision.
Skipping skills
The chain is a default, not a mandate:
- Tiny well-understood change: straight to
/prd-to-issues(or file the issue by hand and run/ship-it). - Bigger but stack-familiar: skip
/researchand/prototype; start at/grill-me. - Pure research, no implementation yet: stop after
/researchor/prototype— the brief or findings are the deliverable. - Existing PRD from somewhere else:
/prd-to-issues <path>and go.
File layout
.claude/skills/
├── README.md ← this file
├── research/
│ └── SKILL.md
├── prototype/
│ └── SKILL.md
├── grill-me/
│ └── SKILL.md
├── prd/
│ └── SKILL.md
├── prd-to-issues/
│ └── SKILL.md
└── ship-it/
├── SKILL.md ← entry point; chat-side bootstrap
├── loop.sh ← orchestrator (the actual loop)
└── iterate.md ← per-issue prompt the loop dispatches
.prototypes/ ← throwaway spike code (gitignored, created by /prototype)
.ship-it-logs/ ← ship-it loop logs (recommend gitignoring)
docs/
├── research/ ← saved research briefs ("save it" in /research)
└── prd/ ← saved PRDs ("save it" in /prd)
Configuration
ship-it tracker support
- GitHub —
ghCLI required (gh auth status) - Gitea —
teaCLI required (go install code.gitea.io/tea@latest && tea login add) - Auto-detected from
git remote get-url origin
ship-it env vars
| Var | Default | Purpose |
|---|---|---|
SHIP_IT_TRUNK |
main |
Trunk branch (set to development for the EVOLV repo) |
SHIP_IT_MAX |
50 | Iteration cap |
SHIP_IT_MAX_FAIL |
3 | Consecutive failures before stop |
SHIP_IT_TIMEOUT |
30m | Per-issue timeout |
SHIP_IT_LOG_DIR |
<repo>/.ship-it-logs |
Log directory |
Example for EVOLV:
SHIP_IT_TRUNK=development bash .claude/skills/ship-it/loop.sh
Issue label expected by ship-it
The loop filters to open issues with label slice and without blocked, needs-decision, or ci-failed. /prd-to-issues applies slice by default. If you file issues by hand, add the label or ship-it won't pick them up.
Troubleshooting
ship-it won't start: "tea CLI not installed".
Repo remote is Gitea but you don't have tea. Install it (go install code.gitea.io/tea@latest && tea login add) or push to a GitHub mirror.
ship-it exits immediately: "git tree is dirty".
Commit or stash before running. The loop won't risk mixing WIP into a slice.
ship-it says "backlog empty" but I have open issues.
Filter requires label slice AND none of blocked / needs-decision / ci-failed. Check labels.
An issue keeps getting needs-decision.
Acceptance criteria probably aren't testable at the outermost layer. Rewrite as observable (e.g. "POST /x returns 201 and row appears on dashboard"), drop the label, rerun.
/prototype keeps wanting to "tidy up" the spike before reporting.
That's a sign the assumption isn't sharp enough — Claude is filling time. Sharpen the assumption and rerun, or just say "stop, report what you have now."
/research returns shallow results.
The decomposed questions were too broad. Ask it to redo with a tighter scope, or constrain ("only this repo" / "only Node-RED + InfluxDB stack").
/prd-to-issues drafts look like layer cake.
Stop, say "reslice — these are horizontal." The skill's self-audit should catch this, but if it doesn't, push back explicitly.
Design principles
- Front-load gap discovery. Research, prototype, grill-me, PRD — each phase exists to surface gaps before they cost real implementation time.
- Gaps are explicit, never hidden. Open Unknowns in
/research→ spike claims in/prototype→ Open Questions in PRD →needs-decisionlabels on issues. Nothing gets papered over. - Vertical slices, always. No "implement the backend first". Every slice exercises every layer.
- AFK only after the contract is locked. Autonomous code only runs against decisions already on paper.
- Throwaway means throwaway. Prototypes are evidence; the real implementation in production code happens fresh in
/ship-it. - Outermost-layer verification. "Tests pass" isn't enough — the loop confirms user-observable behavior actually works before reporting shipped.
- One commit per slice. Small, reviewable, revertible.