Files
EVOLV/.claude/skills/README.md
znetsixe fcaad8cd9f chore(skills): add /research and /prototype; rewrite README for 6-skill chain
Front-loads gap discovery before /grill-me by adding two skills:

  research    MOSTLY  fans out Explore + WebSearch agents in parallel,
                      synthesizes findings into a brief, names open
                      unknowns explicitly (which become /prototype targets)

  prototype   MOSTLY  builds a throwaway spike to test ONE falsifiable
                      assumption; code lives in .prototypes/ (gitignored),
                      never promoted; output is evidence — verdict, numbers,
                      observed behavior — that feeds /prd

Full chain now:
  /research → /prototype → /grill-me → /prd → /prd-to-issues → /ship-it

Chain rationale: /research and /prototype surface knowledge gaps and falsify
risky assumptions while the cost of changing direction is still cheap; the
TOGETHER phases (grill-me, prd) lock down the contract; the AFK phase
(ship-it) only executes against contracts already on paper.

The chain is a default, not a mandate — README covers when to skip
upstream skills for small or stack-familiar work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 16:33:26 +02:00

11 KiB
Raw Blame History

Workflow skills — research → prototype → grill-me → prd → prd-to-issues → ship-it

A six-skill chain that takes a vague idea from "I wonder if we could…" to merged, end-to-end-verified code. Four collaborative phases up front to lock down what's being built; two largely-autonomous phases that execute against that contract.

/research <topic>      MOSTLY      external + repo knowledge into a brief
       ↓
/prototype <claim>     MOSTLY      throwaway spike to test the riskiest assumption
       ↓
/grill-me <topic>      TOGETHER    pressure-test what survived
       ↓
/prd                   TOGETHER    synthesize PRD; gaps stay explicit
       ↓
/prd-to-issues         MOSTLY      thin vertical-slice issues; file on "create"
       ↓
/ship-it               AFK         shell loop ships every slice end-to-end

You don't have to use every skill on every feature. Small tweaks may skip /research and /prototype. Bigger / novel work uses the whole chain.

Mode taxonomy

Mode Meaning Skills
TOGETHER Needs your turn-by-turn judgment. No autonomous path. grill-me
MOSTLY TOGETHER Drafts / fetches / builds AFK. You review the output. Any visible-to-team action needs your explicit "go". research, prototype, prd, prd-to-issues
AFK No human in the loop. Logs questions to issues instead of asking. ship-it

The chain is structured so AFK execution only starts after the human-locked phases have nailed down the contract. Autonomous code never runs against undefined contracts.


When to use each

/research <topic> — MOSTLY TOGETHER

Fans out Explore + WebSearch agents in parallel, synthesizes findings into a research brief, and names open unknowns explicitly (which become candidates for /prototype).

Use when: the topic touches anything you haven't done before in this codebase. Novel libraries, unfamiliar patterns, "how do others solve this".

Don't use it for: stuff you already understand. The point is to fetch what you don't know, not summarize what you do.

/prototype <claim> — MOSTLY TOGETHER

Builds a throwaway spike to test ONE falsifiable assumption. Code lives in .prototypes/ (gitignored) and is never promoted to the main codebase. Output is evidence — verdict, numbers, observed behavior — that feeds the PRD.

Use when: /research surfaced an Open Unknown that "we'll find out when we build it". Better to find out for an hour of spike cost than a week of half-built feature.

Don't use it for: building a "lightweight v0" you secretly plan to evolve. Prototypes are evidence; production code is the real implementation. The skill rejects scope creep mid-spike.

/grill-me <topic> — TOGETHER

Senior staff engineer running a brutal-but-fair interview. One hard question at a time, honest critique (no praise filler), drills into weak spots. Stay on topic until exhausted; say stop for an honest 3-bullet debrief.

Use when: /research and /prototype (if used) have built up enough context that you need to pressure-test your own thinking before locking it in a PRD.

Don't use it for: rubber-stamping a finished idea, or when you want validation. Designed to find gaps, not to agree.

/prd — TOGETHER (drafts AFK after grilling)

Engineering PRD: Problem, Goals, Non-goals, Users & scenarios, Functional + Non-functional requirements, Constraints, Success metrics, Open Questions, Out of scope. Things you nailed in grilling become firm requirements. Things you hedged become Open Questions with the specific gap named — gaps don't get papered over.

Use when: the grilling exposed enough that the feature shape is clear. Or standalone when you already have full context.

Don't use it for: strategy decks, market sizing, "why now". This is for engineering.

/prd-to-issues — MOSTLY TOGETHER

Breaks the PRD into thin vertical slices — each issue cuts end-to-end through every integration layer (schema → service → API → UI → tests; or sensor → broker → parser → store → dashboard). First slice is a walking skeleton. Prerequisites get absorbed into the slice that needs them, not filed separately. Per-issue Slice check block proves every layer is covered, plus a coverage matrix at the top of the draft showing PRD → issue mapping. Self-audit runs before the draft is shown to you.

Output: draft inline → you reply create → files to the tracker (gh for GitHub, tea for Gitea).

Don't use it for: horizontal task lists ("DB work", "API work", "frontend work"). The skill rejects layer-cake slicing.

/ship-it — AFK

Shell loop in .claude/skills/ship-it/loop.sh. Picks the next ready issue, dispatches a fresh headless Claude to ship it end-to-end (failing e2e test first → implement layer by layer → full suite → outermost-layer smoke check → commit Closes #N → PR with acceptance-criteria checkboxes + smoke evidence → CI gate → merge or leave-for-review), then moves on. One commit per issue. Status streams to terminal; tail logs from another shell; Ctrl-C anytime.

Undecidable issues get labeled needs-decision and skipped. Three consecutive failures stops the loop for human review.

Don't use it for: issues whose acceptance criteria aren't testable. The loop will skip them.


A worked example

Adding live sensor display for a new flow meter to the operator dashboard.

# 1. Fetch what we don't already know
/research adding live flow-meter readings to the operator dashboard
# Brief lands; surfaces an Open Unknown:
#   "Can Node-RED sustain 1Hz updates across 12 dashboard panels for 10 min
#    straight without dropping frames?"

# 2. Test the risky assumption
/prototype Node-RED can stream 1Hz updates to 12 Grafana panels for 10 min straight
# Spike runs in .prototypes/nodered-throughput/;
# Verdict: confirmed, 14% CPU peak. Evidence captured. Prototype stays gitignored.

# 3. Pressure-test the design
/grill-me adding live flow-meter readings to the operator dashboard
# 68 hard questions; surfaces gaps in alerting and missing-data handling.

# 4. Lock down the contract
/prd
# PRD drafts with the alerting decision as a firm requirement and the
# missing-data behavior as an explicit Open Question.

# 5. Slice it
/prd-to-issues
# 5 slices; coverage matrix confirms every PRD requirement maps to a slice.
# Reply `create` → issues #142..#146 filed.

# 6. Walk away
/ship-it
# Preflight, plan, "Start? Reply `go`." → `go` → shell loop runs.
# Another terminal:  tail -f .ship-it-logs/run-*.log

After ship-it exits, the summary tells you what shipped, what's open for review, what hit needs-decision.

Skipping skills

The chain is a default, not a mandate:

  • Tiny well-understood change: straight to /prd-to-issues (or file the issue by hand and run /ship-it).
  • Bigger but stack-familiar: skip /research and /prototype; start at /grill-me.
  • Pure research, no implementation yet: stop after /research or /prototype — the brief or findings are the deliverable.
  • Existing PRD from somewhere else: /prd-to-issues <path> and go.

File layout

.claude/skills/
├── README.md                ← this file
├── research/
│   └── SKILL.md
├── prototype/
│   └── SKILL.md
├── grill-me/
│   └── SKILL.md
├── prd/
│   └── SKILL.md
├── prd-to-issues/
│   └── SKILL.md
└── ship-it/
    ├── SKILL.md             ← entry point; chat-side bootstrap
    ├── loop.sh              ← orchestrator (the actual loop)
    └── iterate.md           ← per-issue prompt the loop dispatches

.prototypes/                 ← throwaway spike code (gitignored, created by /prototype)
.ship-it-logs/               ← ship-it loop logs (recommend gitignoring)
docs/
├── research/                ← saved research briefs ("save it" in /research)
└── prd/                     ← saved PRDs ("save it" in /prd)

Configuration

ship-it tracker support

  • GitHubgh CLI required (gh auth status)
  • Giteatea CLI required (go install code.gitea.io/tea@latest && tea login add)
  • Auto-detected from git remote get-url origin

ship-it env vars

Var Default Purpose
SHIP_IT_TRUNK main Trunk branch (set to development for the EVOLV repo)
SHIP_IT_MAX 50 Iteration cap
SHIP_IT_MAX_FAIL 3 Consecutive failures before stop
SHIP_IT_TIMEOUT 30m Per-issue timeout
SHIP_IT_LOG_DIR <repo>/.ship-it-logs Log directory

Example for EVOLV:

SHIP_IT_TRUNK=development bash .claude/skills/ship-it/loop.sh

Issue label expected by ship-it

The loop filters to open issues with label slice and without blocked, needs-decision, or ci-failed. /prd-to-issues applies slice by default. If you file issues by hand, add the label or ship-it won't pick them up.


Troubleshooting

ship-it won't start: "tea CLI not installed". Repo remote is Gitea but you don't have tea. Install it (go install code.gitea.io/tea@latest && tea login add) or push to a GitHub mirror.

ship-it exits immediately: "git tree is dirty". Commit or stash before running. The loop won't risk mixing WIP into a slice.

ship-it says "backlog empty" but I have open issues. Filter requires label slice AND none of blocked / needs-decision / ci-failed. Check labels.

An issue keeps getting needs-decision. Acceptance criteria probably aren't testable at the outermost layer. Rewrite as observable (e.g. "POST /x returns 201 and row appears on dashboard"), drop the label, rerun.

/prototype keeps wanting to "tidy up" the spike before reporting. That's a sign the assumption isn't sharp enough — Claude is filling time. Sharpen the assumption and rerun, or just say "stop, report what you have now."

/research returns shallow results. The decomposed questions were too broad. Ask it to redo with a tighter scope, or constrain ("only this repo" / "only Node-RED + InfluxDB stack").

/prd-to-issues drafts look like layer cake. Stop, say "reslice — these are horizontal." The skill's self-audit should catch this, but if it doesn't, push back explicitly.


Design principles

  • Front-load gap discovery. Research, prototype, grill-me, PRD — each phase exists to surface gaps before they cost real implementation time.
  • Gaps are explicit, never hidden. Open Unknowns in /research → spike claims in /prototype → Open Questions in PRD → needs-decision labels on issues. Nothing gets papered over.
  • Vertical slices, always. No "implement the backend first". Every slice exercises every layer.
  • AFK only after the contract is locked. Autonomous code only runs against decisions already on paper.
  • Throwaway means throwaway. Prototypes are evidence; the real implementation in production code happens fresh in /ship-it.
  • Outermost-layer verification. "Tests pass" isn't enough — the loop confirms user-observable behavior actually works before reporting shipped.
  • One commit per slice. Small, reviewable, revertible.