A six-skill chain that takes a vague idea from "I wonder if we could…" to merged, end-to-end-verified code. Four collaborative phases up front to lock down what's being built; two largely-autonomous phases that execute against that contract.
The chain is structured so AFK execution only starts after the human-locked phases have nailed down the contract. Autonomous code never runs against undefined contracts.
Fans out Explore + WebSearch agents in parallel, synthesizes findings into a research brief, and names open unknowns explicitly (which become candidates for `/prototype`).
Builds a throwaway spike to test ONE falsifiable assumption. Code lives in `.prototypes/` (gitignored) and is never promoted to the main codebase. Output is evidence — verdict, numbers, observed behavior — that feeds the PRD.
**Use when:** `/research` surfaced an Open Unknown that "we'll find out when we build it". Better to find out for an hour of spike cost than a week of half-built feature.
**Don't use it for:** building a "lightweight v0" you secretly plan to evolve. Prototypes are evidence; production code is the real implementation. The skill rejects scope creep mid-spike.
Senior staff engineer running a brutal-but-fair interview. One hard question at a time, honest critique (no praise filler), drills into weak spots. Stay on topic until exhausted; say `stop` for an honest 3-bullet debrief.
**Use when:** `/research` and `/prototype` (if used) have built up enough context that you need to pressure-test your own thinking before locking it in a PRD.
Engineering PRD: Problem, Goals, Non-goals, Users & scenarios, Functional + Non-functional requirements, Constraints, Success metrics, Open Questions, Out of scope. Things you nailed in grilling become firm requirements. Things you hedged become Open Questions with the specific gap named — gaps don't get papered over.
Breaks the PRD into **thin vertical slices** — each issue cuts end-to-end through every integration layer (schema → service → API → UI → tests; or sensor → broker → parser → store → dashboard). First slice is a walking skeleton. Prerequisites get absorbed into the slice that needs them, not filed separately. Per-issue `Slice check` block proves every layer is covered, plus a coverage matrix at the top of the draft showing PRD → issue mapping. Self-audit runs **before** the draft is shown to you.
Shell loop in `.claude/skills/ship-it/loop.sh`. Picks the next ready issue, dispatches a fresh headless Claude to ship it end-to-end (failing e2e test first → implement layer by layer → full suite → outermost-layer smoke check → commit `Closes #N` → PR with acceptance-criteria checkboxes + smoke evidence → CI gate → merge or leave-for-review), then moves on. One commit per issue. Status streams to terminal; tail logs from another shell; Ctrl-C anytime.
The loop filters to open issues with label `slice` and without `blocked`, `needs-decision`, or `ci-failed`. `/prd-to-issues` applies `slice` by default. If you file issues by hand, add the label or ship-it won't pick them up.
Acceptance criteria probably aren't testable at the outermost layer. Rewrite as observable (e.g. "POST /x returns 201 and row appears on dashboard"), drop the label, rerun.
**`/prototype` keeps wanting to "tidy up" the spike before reporting.**
That's a sign the assumption isn't sharp enough — Claude is filling time. Sharpen the assumption and rerun, or just say "stop, report what you have now."
- **Front-load gap discovery.** Research, prototype, grill-me, PRD — each phase exists to surface gaps before they cost real implementation time.
- **Gaps are explicit, never hidden.** Open Unknowns in `/research` → spike claims in `/prototype` → Open Questions in PRD → `needs-decision` labels on issues. Nothing gets papered over.
- **Vertical slices, always.** No "implement the backend first". Every slice exercises every layer.
- **AFK only after the contract is locked.** Autonomous code only runs against decisions already on paper.
- **Throwaway means throwaway.** Prototypes are evidence; the real implementation in production code happens fresh in `/ship-it`.
- **Outermost-layer verification.** "Tests pass" isn't enough — the loop confirms user-observable behavior actually works before reporting shipped.