# Workflow skills — research → prototype → grill-me → prd → prd-to-issues → ship-it A six-skill chain that takes a vague idea from "I wonder if we could…" to merged, end-to-end-verified code. Four collaborative phases up front to lock down what's being built; two largely-autonomous phases that execute against that contract. ``` /research MOSTLY external + repo knowledge into a brief ↓ /prototype MOSTLY throwaway spike to test the riskiest assumption ↓ /grill-me TOGETHER pressure-test what survived ↓ /prd TOGETHER synthesize PRD; gaps stay explicit ↓ /prd-to-issues MOSTLY thin vertical-slice issues; file on "create" ↓ /ship-it AFK shell loop ships every slice end-to-end ``` You don't have to use every skill on every feature. Small tweaks may skip `/research` and `/prototype`. Bigger / novel work uses the whole chain. ## Mode taxonomy | Mode | Meaning | Skills | |---|---|---| | **TOGETHER** | Needs your turn-by-turn judgment. No autonomous path. | grill-me | | **MOSTLY TOGETHER** | Drafts / fetches / builds AFK. You review the output. Any visible-to-team action needs your explicit "go". | research, prototype, prd, prd-to-issues | | **AFK** | No human in the loop. Logs questions to issues instead of asking. | ship-it | The chain is structured so AFK execution only starts after the human-locked phases have nailed down the contract. Autonomous code never runs against undefined contracts. --- ## When to use each ### `/research ` — MOSTLY TOGETHER Fans out Explore + WebSearch agents in parallel, synthesizes findings into a research brief, and names open unknowns explicitly (which become candidates for `/prototype`). **Use when:** the topic touches anything you haven't done before in this codebase. Novel libraries, unfamiliar patterns, "how do others solve this". **Don't use it for:** stuff you already understand. The point is to fetch what you don't know, not summarize what you do. ### `/prototype ` — MOSTLY TOGETHER Builds a throwaway spike to test ONE falsifiable assumption. Code lives in `.prototypes/` (gitignored) and is never promoted to the main codebase. Output is evidence — verdict, numbers, observed behavior — that feeds the PRD. **Use when:** `/research` surfaced an Open Unknown that "we'll find out when we build it". Better to find out for an hour of spike cost than a week of half-built feature. **Don't use it for:** building a "lightweight v0" you secretly plan to evolve. Prototypes are evidence; production code is the real implementation. The skill rejects scope creep mid-spike. ### `/grill-me ` — TOGETHER Senior staff engineer running a brutal-but-fair interview. One hard question at a time, honest critique (no praise filler), drills into weak spots. Stay on topic until exhausted; say `stop` for an honest 3-bullet debrief. **Use when:** `/research` and `/prototype` (if used) have built up enough context that you need to pressure-test your own thinking before locking it in a PRD. **Don't use it for:** rubber-stamping a finished idea, or when you want validation. Designed to find gaps, not to agree. ### `/prd` — TOGETHER (drafts AFK after grilling) Engineering PRD: Problem, Goals, Non-goals, Users & scenarios, Functional + Non-functional requirements, Constraints, Success metrics, Open Questions, Out of scope. Things you nailed in grilling become firm requirements. Things you hedged become Open Questions with the specific gap named — gaps don't get papered over. **Use when:** the grilling exposed enough that the feature shape is clear. Or standalone when you already have full context. **Don't use it for:** strategy decks, market sizing, "why now". This is for engineering. ### `/prd-to-issues` — MOSTLY TOGETHER Breaks the PRD into **thin vertical slices** — each issue cuts end-to-end through every integration layer (schema → service → API → UI → tests; or sensor → broker → parser → store → dashboard). First slice is a walking skeleton. Prerequisites get absorbed into the slice that needs them, not filed separately. Per-issue `Slice check` block proves every layer is covered, plus a coverage matrix at the top of the draft showing PRD → issue mapping. Self-audit runs **before** the draft is shown to you. **Output:** draft inline → you reply `create` → files to the tracker (`gh` for GitHub, `tea` for Gitea). **Don't use it for:** horizontal task lists ("DB work", "API work", "frontend work"). The skill rejects layer-cake slicing. ### `/ship-it` — AFK Shell loop in `.claude/skills/ship-it/loop.sh`. Picks the next ready issue, dispatches a fresh headless Claude to ship it end-to-end (failing e2e test first → implement layer by layer → full suite → outermost-layer smoke check → commit `Closes #N` → PR with acceptance-criteria checkboxes + smoke evidence → CI gate → merge or leave-for-review), then moves on. One commit per issue. Status streams to terminal; tail logs from another shell; Ctrl-C anytime. Undecidable issues get labeled `needs-decision` and skipped. Three consecutive failures stops the loop for human review. **Don't use it for:** issues whose acceptance criteria aren't testable. The loop will skip them. --- ## A worked example Adding live sensor display for a new flow meter to the operator dashboard. ```bash # 1. Fetch what we don't already know /research adding live flow-meter readings to the operator dashboard # Brief lands; surfaces an Open Unknown: # "Can Node-RED sustain 1Hz updates across 12 dashboard panels for 10 min # straight without dropping frames?" # 2. Test the risky assumption /prototype Node-RED can stream 1Hz updates to 12 Grafana panels for 10 min straight # Spike runs in .prototypes/nodered-throughput/; # Verdict: confirmed, 14% CPU peak. Evidence captured. Prototype stays gitignored. # 3. Pressure-test the design /grill-me adding live flow-meter readings to the operator dashboard # 6–8 hard questions; surfaces gaps in alerting and missing-data handling. # 4. Lock down the contract /prd # PRD drafts with the alerting decision as a firm requirement and the # missing-data behavior as an explicit Open Question. # 5. Slice it /prd-to-issues # 5 slices; coverage matrix confirms every PRD requirement maps to a slice. # Reply `create` → issues #142..#146 filed. # 6. Walk away /ship-it # Preflight, plan, "Start? Reply `go`." → `go` → shell loop runs. # Another terminal: tail -f .ship-it-logs/run-*.log ``` After ship-it exits, the summary tells you what shipped, what's open for review, what hit `needs-decision`. ## Skipping skills The chain is a default, not a mandate: - **Tiny well-understood change:** straight to `/prd-to-issues` (or file the issue by hand and run `/ship-it`). - **Bigger but stack-familiar:** skip `/research` and `/prototype`; start at `/grill-me`. - **Pure research, no implementation yet:** stop after `/research` or `/prototype` — the brief or findings are the deliverable. - **Existing PRD from somewhere else:** `/prd-to-issues ` and go. --- ## File layout ``` .claude/skills/ ├── README.md ← this file ├── research/ │ └── SKILL.md ├── prototype/ │ └── SKILL.md ├── grill-me/ │ └── SKILL.md ├── prd/ │ └── SKILL.md ├── prd-to-issues/ │ └── SKILL.md └── ship-it/ ├── SKILL.md ← entry point; chat-side bootstrap ├── loop.sh ← orchestrator (the actual loop) └── iterate.md ← per-issue prompt the loop dispatches .prototypes/ ← throwaway spike code (gitignored, created by /prototype) .ship-it-logs/ ← ship-it loop logs (recommend gitignoring) docs/ ├── research/ ← saved research briefs ("save it" in /research) └── prd/ ← saved PRDs ("save it" in /prd) ``` --- ## Configuration ### ship-it tracker support - **GitHub** — `gh` CLI required (`gh auth status`) - **Gitea** — `tea` CLI required (`go install code.gitea.io/tea@latest && tea login add`) - Auto-detected from `git remote get-url origin` ### ship-it env vars | Var | Default | Purpose | |---|---|---| | `SHIP_IT_TRUNK` | `main` | Trunk branch (set to `development` for the EVOLV repo) | | `SHIP_IT_MAX` | 50 | Iteration cap | | `SHIP_IT_MAX_FAIL` | 3 | Consecutive failures before stop | | `SHIP_IT_TIMEOUT` | 30m | Per-issue timeout | | `SHIP_IT_LOG_DIR` | `/.ship-it-logs` | Log directory | Example for EVOLV: ```bash SHIP_IT_TRUNK=development bash .claude/skills/ship-it/loop.sh ``` ### Issue label expected by ship-it The loop filters to open issues with label `slice` and without `blocked`, `needs-decision`, or `ci-failed`. `/prd-to-issues` applies `slice` by default. If you file issues by hand, add the label or ship-it won't pick them up. --- ## Troubleshooting **`ship-it` won't start: "tea CLI not installed".** Repo remote is Gitea but you don't have `tea`. Install it (`go install code.gitea.io/tea@latest && tea login add`) or push to a GitHub mirror. **`ship-it` exits immediately: "git tree is dirty".** Commit or stash before running. The loop won't risk mixing WIP into a slice. **`ship-it` says "backlog empty" but I have open issues.** Filter requires label `slice` AND none of `blocked` / `needs-decision` / `ci-failed`. Check labels. **An issue keeps getting `needs-decision`.** Acceptance criteria probably aren't testable at the outermost layer. Rewrite as observable (e.g. "POST /x returns 201 and row appears on dashboard"), drop the label, rerun. **`/prototype` keeps wanting to "tidy up" the spike before reporting.** That's a sign the assumption isn't sharp enough — Claude is filling time. Sharpen the assumption and rerun, or just say "stop, report what you have now." **`/research` returns shallow results.** The decomposed questions were too broad. Ask it to redo with a tighter scope, or constrain ("only this repo" / "only Node-RED + InfluxDB stack"). **`/prd-to-issues` drafts look like layer cake.** Stop, say "reslice — these are horizontal." The skill's self-audit should catch this, but if it doesn't, push back explicitly. --- ## Design principles - **Front-load gap discovery.** Research, prototype, grill-me, PRD — each phase exists to surface gaps before they cost real implementation time. - **Gaps are explicit, never hidden.** Open Unknowns in `/research` → spike claims in `/prototype` → Open Questions in PRD → `needs-decision` labels on issues. Nothing gets papered over. - **Vertical slices, always.** No "implement the backend first". Every slice exercises every layer. - **AFK only after the contract is locked.** Autonomous code only runs against decisions already on paper. - **Throwaway means throwaway.** Prototypes are evidence; the real implementation in production code happens fresh in `/ship-it`. - **Outermost-layer verification.** "Tests pass" isn't enough — the loop confirms user-observable behavior actually works before reporting shipped. - **One commit per slice.** Small, reviewable, revertible.