RnD/EVOLV

Files

znetsixe fcaad8cd9f chore(skills): add /research and /prototype; rewrite README for 6-skill chain

Front-loads gap discovery before /grill-me by adding two skills:

  research    MOSTLY  fans out Explore + WebSearch agents in parallel,
                      synthesizes findings into a brief, names open
                      unknowns explicitly (which become /prototype targets)

  prototype   MOSTLY  builds a throwaway spike to test ONE falsifiable
                      assumption; code lives in .prototypes/ (gitignored),
                      never promoted; output is evidence — verdict, numbers,
                      observed behavior — that feeds /prd

Full chain now:
  /research → /prototype → /grill-me → /prd → /prd-to-issues → /ship-it

Chain rationale: /research and /prototype surface knowledge gaps and falsify
risky assumptions while the cost of changing direction is still cheap; the
TOGETHER phases (grill-me, prd) lock down the contract; the AFK phase
(ship-it) only executes against contracts already on paper.

The chain is a default, not a mandate — README covers when to skip
upstream skills for small or stack-familiar work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-21 16:33:26 +02:00

11 KiB

Raw Blame History

Workflow skills — research → prototype → grill-me → prd → prd-to-issues → ship-it

A six-skill chain that takes a vague idea from "I wonder if we could…" to merged, end-to-end-verified code. Four collaborative phases up front to lock down what's being built; two largely-autonomous phases that execute against that contract.

/research <topic>      MOSTLY      external + repo knowledge into a brief
       ↓
/prototype <claim>     MOSTLY      throwaway spike to test the riskiest assumption
       ↓
/grill-me <topic>      TOGETHER    pressure-test what survived
       ↓
/prd                   TOGETHER    synthesize PRD; gaps stay explicit
       ↓
/prd-to-issues         MOSTLY      thin vertical-slice issues; file on "create"
       ↓
/ship-it               AFK         shell loop ships every slice end-to-end

You don't have to use every skill on every feature. Small tweaks may skip /research and /prototype. Bigger / novel work uses the whole chain.

Mode taxonomy

Mode	Meaning	Skills
TOGETHER	Needs your turn-by-turn judgment. No autonomous path.	grill-me
MOSTLY TOGETHER	Drafts / fetches / builds AFK. You review the output. Any visible-to-team action needs your explicit "go".	research, prototype, prd, prd-to-issues
AFK	No human in the loop. Logs questions to issues instead of asking.	ship-it

The chain is structured so AFK execution only starts after the human-locked phases have nailed down the contract. Autonomous code never runs against undefined contracts.

When to use each

`/research <topic>` — MOSTLY TOGETHER

Fans out Explore + WebSearch agents in parallel, synthesizes findings into a research brief, and names open unknowns explicitly (which become candidates for /prototype).

Use when: the topic touches anything you haven't done before in this codebase. Novel libraries, unfamiliar patterns, "how do others solve this".

Don't use it for: stuff you already understand. The point is to fetch what you don't know, not summarize what you do.

`/prototype <claim>` — MOSTLY TOGETHER

Builds a throwaway spike to test ONE falsifiable assumption. Code lives in .prototypes/ (gitignored) and is never promoted to the main codebase. Output is evidence — verdict, numbers, observed behavior — that feeds the PRD.

Use when: /research surfaced an Open Unknown that "we'll find out when we build it". Better to find out for an hour of spike cost than a week of half-built feature.

Don't use it for: building a "lightweight v0" you secretly plan to evolve. Prototypes are evidence; production code is the real implementation. The skill rejects scope creep mid-spike.

`/grill-me <topic>` — TOGETHER

Senior staff engineer running a brutal-but-fair interview. One hard question at a time, honest critique (no praise filler), drills into weak spots. Stay on topic until exhausted; say stop for an honest 3-bullet debrief.

Use when: /research and /prototype (if used) have built up enough context that you need to pressure-test your own thinking before locking it in a PRD.

Don't use it for: rubber-stamping a finished idea, or when you want validation. Designed to find gaps, not to agree.

`/prd` — TOGETHER (drafts AFK after grilling)

Engineering PRD: Problem, Goals, Non-goals, Users & scenarios, Functional + Non-functional requirements, Constraints, Success metrics, Open Questions, Out of scope. Things you nailed in grilling become firm requirements. Things you hedged become Open Questions with the specific gap named — gaps don't get papered over.

Use when: the grilling exposed enough that the feature shape is clear. Or standalone when you already have full context.

Don't use it for: strategy decks, market sizing, "why now". This is for engineering.

`/prd-to-issues` — MOSTLY TOGETHER

Breaks the PRD into thin vertical slices — each issue cuts end-to-end through every integration layer (schema → service → API → UI → tests; or sensor → broker → parser → store → dashboard). First slice is a walking skeleton. Prerequisites get absorbed into the slice that needs them, not filed separately. Per-issue Slice check block proves every layer is covered, plus a coverage matrix at the top of the draft showing PRD → issue mapping. Self-audit runs before the draft is shown to you.

Output: draft inline → you reply create → files to the tracker (gh for GitHub, tea for Gitea).

Don't use it for: horizontal task lists ("DB work", "API work", "frontend work"). The skill rejects layer-cake slicing.

`/ship-it` — AFK

Shell loop in .claude/skills/ship-it/loop.sh. Picks the next ready issue, dispatches a fresh headless Claude to ship it end-to-end (failing e2e test first → implement layer by layer → full suite → outermost-layer smoke check → commit Closes #N → PR with acceptance-criteria checkboxes + smoke evidence → CI gate → merge or leave-for-review), then moves on. One commit per issue. Status streams to terminal; tail logs from another shell; Ctrl-C anytime.

Undecidable issues get labeled needs-decision and skipped. Three consecutive failures stops the loop for human review.

Don't use it for: issues whose acceptance criteria aren't testable. The loop will skip them.

A worked example

Adding live sensor display for a new flow meter to the operator dashboard.

# 1. Fetch what we don't already know
/research adding live flow-meter readings to the operator dashboard
# Brief lands; surfaces an Open Unknown:
#   "Can Node-RED sustain 1Hz updates across 12 dashboard panels for 10 min
#    straight without dropping frames?"

# 2. Test the risky assumption
/prototype Node-RED can stream 1Hz updates to 12 Grafana panels for 10 min straight
# Spike runs in .prototypes/nodered-throughput/;
# Verdict: confirmed, 14% CPU peak. Evidence captured. Prototype stays gitignored.

# 3. Pressure-test the design
/grill-me adding live flow-meter readings to the operator dashboard
# 6–8 hard questions; surfaces gaps in alerting and missing-data handling.

# 4. Lock down the contract
/prd
# PRD drafts with the alerting decision as a firm requirement and the
# missing-data behavior as an explicit Open Question.

# 5. Slice it
/prd-to-issues
# 5 slices; coverage matrix confirms every PRD requirement maps to a slice.
# Reply `create` → issues #142..#146 filed.

# 6. Walk away
/ship-it
# Preflight, plan, "Start? Reply `go`." → `go` → shell loop runs.
# Another terminal:  tail -f .ship-it-logs/run-*.log

After ship-it exits, the summary tells you what shipped, what's open for review, what hit needs-decision.

Skipping skills

The chain is a default, not a mandate:

Tiny well-understood change: straight to /prd-to-issues (or file the issue by hand and run /ship-it).
Bigger but stack-familiar: skip /research and /prototype; start at /grill-me.
Pure research, no implementation yet: stop after /research or /prototype — the brief or findings are the deliverable.
Existing PRD from somewhere else: /prd-to-issues <path> and go.

File layout

.claude/skills/
├── README.md                ← this file
├── research/
│   └── SKILL.md
├── prototype/
│   └── SKILL.md
├── grill-me/
│   └── SKILL.md
├── prd/
│   └── SKILL.md
├── prd-to-issues/
│   └── SKILL.md
└── ship-it/
    ├── SKILL.md             ← entry point; chat-side bootstrap
    ├── loop.sh              ← orchestrator (the actual loop)
    └── iterate.md           ← per-issue prompt the loop dispatches

.prototypes/                 ← throwaway spike code (gitignored, created by /prototype)
.ship-it-logs/               ← ship-it loop logs (recommend gitignoring)
docs/
├── research/                ← saved research briefs ("save it" in /research)
└── prd/                     ← saved PRDs ("save it" in /prd)

Configuration

ship-it tracker support

GitHub — gh CLI required (gh auth status)
Gitea — tea CLI required (go install code.gitea.io/tea@latest && tea login add)
Auto-detected from git remote get-url origin

ship-it env vars

Var	Default	Purpose
`SHIP_IT_TRUNK`	`main`	Trunk branch (set to `development` for the EVOLV repo)
`SHIP_IT_MAX`	50	Iteration cap
`SHIP_IT_MAX_FAIL`	3	Consecutive failures before stop
`SHIP_IT_TIMEOUT`	30m	Per-issue timeout
`SHIP_IT_LOG_DIR`	`<repo>/.ship-it-logs`	Log directory

Example for EVOLV:

SHIP_IT_TRUNK=development bash .claude/skills/ship-it/loop.sh

Issue label expected by ship-it

The loop filters to open issues with label slice and without blocked, needs-decision, or ci-failed. /prd-to-issues applies slice by default. If you file issues by hand, add the label or ship-it won't pick them up.

Troubleshooting

ship-it won't start: "tea CLI not installed". Repo remote is Gitea but you don't have tea. Install it (go install code.gitea.io/tea@latest && tea login add) or push to a GitHub mirror.

ship-it exits immediately: "git tree is dirty". Commit or stash before running. The loop won't risk mixing WIP into a slice.

ship-it says "backlog empty" but I have open issues. Filter requires label slice AND none of blocked / needs-decision / ci-failed. Check labels.

An issue keeps getting needs-decision. Acceptance criteria probably aren't testable at the outermost layer. Rewrite as observable (e.g. "POST /x returns 201 and row appears on dashboard"), drop the label, rerun.

/prototype keeps wanting to "tidy up" the spike before reporting. That's a sign the assumption isn't sharp enough — Claude is filling time. Sharpen the assumption and rerun, or just say "stop, report what you have now."

/research returns shallow results. The decomposed questions were too broad. Ask it to redo with a tighter scope, or constrain ("only this repo" / "only Node-RED + InfluxDB stack").

/prd-to-issues drafts look like layer cake. Stop, say "reslice — these are horizontal." The skill's self-audit should catch this, but if it doesn't, push back explicitly.

Design principles

Front-load gap discovery. Research, prototype, grill-me, PRD — each phase exists to surface gaps before they cost real implementation time.
Gaps are explicit, never hidden. Open Unknowns in /research → spike claims in /prototype → Open Questions in PRD → needs-decision labels on issues. Nothing gets papered over.
Vertical slices, always. No "implement the backend first". Every slice exercises every layer.
AFK only after the contract is locked. Autonomous code only runs against decisions already on paper.
Throwaway means throwaway. Prototypes are evidence; the real implementation in production code happens fresh in /ship-it.
Outermost-layer verification. "Tests pass" isn't enough — the loop confirms user-observable behavior actually works before reporting shipped.
One commit per slice. Small, reviewable, revertible.

11 KiB Raw Blame History Unescape Escape