Files
EVOLV/.claude/skills/ship-it/SKILL.md

116 lines
7.1 KiB
Markdown
Raw Normal View History

---
name: ship-it
description: AFK autopilot. Drives a shell loop that works through every ready issue in the tracker (GitHub via gh, Gitea via tea), implementing each vertical slice end-to-end and committing per issue. Status streams to the terminal so the human can tail progress locally and Ctrl-C anytime. The shell is the loop; each iteration dispatches one fresh headless Claude run to ship one issue. Use when the user invokes /ship-it, says "go AFK on this", "work the backlog", "ralph the issues", or "ship everything".
---
# Ship It — AFK backlog autopilot
**Mode: AFK.** No human in the loop. Does not ask questions mid-run. If a slice is undecidable, the iteration labels the issue `needs-decision` and the loop moves on. The human gets one summary at the end, not chatter during.
## How this works (read before invoking)
The actual loop runs in a shell script: `.claude/skills/ship-it/loop.sh`. **The shell is the loop**, not you. Each iteration shells out to a fresh, headless `claude -p` invocation that processes exactly one issue using `.claude/skills/ship-it/iterate.md` as its prompt. Three reasons this design beats "LLM keeps going inside one session":
1. **Fresh context per issue.** No drift, no accumulated history bloating the window.
2. **Visible in the terminal.** Progress streams to stdout and tees to a log file. The human can tail it from another shell, see commits land, and Ctrl-C cleanly.
3. **Survives session close.** Closing the interactive Claude window doesn't kill the loop. Re-attach by tailing the log.
## Files
- `loop.sh` — orchestrator. Tracker detection, preflight, dispatch loop, status output, stop conditions, summary.
- `iterate.md` — the prompt passed to each per-issue headless Claude. Read it; it defines what "shipped" means.
- `SKILL.md` — this file. When the user invokes `/ship-it`, you bootstrap and hand off.
## When the user invokes /ship-it
You (the interactive Claude) do the bootstrap, not the work. Concretely:
1. **Preflight in chat** (catches the obvious failures before the script runs):
- `git status --porcelain` empty?
- On `main` (or `$SHIP_IT_TRUNK`)? Up-to-date with origin?
- `gh auth status` (or tea token) returns 0?
- `gh issue list --state open --label slice | wc -l` ≥ 1?
2. **Show the plan** in one short block: tracker host, trunk branch, count of ready issues, the first 3 issue titles, the log path. Nothing more.
3. **Ask one question:** "Start? Reply `go`." This is the *only* human-in-the-loop checkpoint — kicking off AFK work is a real commitment, deserves an explicit ok.
4. **On `go`:** run the loop in the foreground so the user sees live output:
```
bash .claude/skills/ship-it/loop.sh
```
Do not background it. Do not pipe through anything that buffers. The user can Ctrl-C.
5. **While it runs:** stay silent. Don't interject. Don't "monitor" by re-reading logs in chat — the user has the terminal.
6. **When it exits:** read the final `==== ship-it summary ====` block from the log file, present it once with concrete next steps ("2 issues are `needs-decision` — open them to answer their questions?").
## Following progress
The script logs to stdout AND tees to `.ship-it-logs/run-<RUN_ID>.log`. Tail from another terminal:
```bash
tail -f .ship-it-logs/run-*.log
```
Per-issue detail (everything the headless Claude did for that one issue) is in `.ship-it-logs/iter-<RUN_ID>-<ISSUE>.log` — useful for debugging a failed iteration.
Commits land in git as the loop runs. Watch with:
```bash
watch -n 5 'git log --oneline -10 origin/main'
```
## Config (env vars, override before invoking)
| Var | Default | Purpose |
|---|---|---|
| `SHIP_IT_MAX` | 50 | Hard cap on iterations per run |
| `SHIP_IT_MAX_FAIL` | 3 | Consecutive failures before stop |
| `SHIP_IT_TRUNK` | `main` | Trunk branch name |
| `SHIP_IT_TIMEOUT` | `30m` | Per-issue timeout (kills the headless claude) |
| `SHIP_IT_LOG_DIR` | `<repo>/.ship-it-logs` | Where logs go |
## What each iteration does (per `iterate.md`)
For one issue: read it → branch from trunk → write failing e2e test at the outermost layer → implement layer by layer until the test passes → run the full suite → outermost-layer smoke check → commit (one commit, message ends `Closes #N`) → push → open PR with acceptance-criteria checkboxes + smoke evidence → wait for CI → merge if green and branch protection allows, else leave open for review → return to trunk → emit `ITERATION_RESULT:` line for the loop.
**Commit per issue:** yes, exactly. One commit per slice, referenced to the issue, lands on the branch before the PR opens. The slice scope was made small in `/prd-to-issues` precisely so this is one tight commit, not a series.
## Stop conditions (in priority order)
1. **User Ctrl-C** → trap catches SIGINT, current step finishes cleanly, summary prints, exit 130.
2. **Backlog empty** (no ready issues) → exit 0.
3. **Three consecutive hard failures** → exit 1. Something systemic — bad dependency, branch protection blocking, flaky env. Surfaces for human review.
4. **Precondition violated mid-run** → exit non-zero with reason.
## What "ready" means (the loop's filter)
An issue is `ready` iff:
- State is open
- Has label `slice` (filed by `/prd-to-issues`)
- Does NOT have label `blocked`, `needs-decision`, or `ci-failed`
- Is not a spike (spikes deliver decisions, not code — humans handle those)
Issues are processed in number order — walking-skeleton first, as `/prd-to-issues` ordered them.
## Safety boundaries
The headless Claude is launched with a tool allowlist that excludes destructive operations. It cannot:
- Force-push or rewrite shared history
- Bypass branch protection or skip CI hooks (`--no-verify`, `--admin`)
- Auto-merge red or pending PRs (the iterate prompt forbids it, and CI gates back it up)
- Modify CI/CD config or IaC unless the slice's `Slice — layers touched` line explicitly names that layer
- Close issues without the outermost-layer smoke check passing
- Assign people or change milestones/projects
If something tries to push past these in practice (e.g. a slice "needs" a CI change to pass), it should fail the iteration with `needs-decision` and let a human approve the scope expansion.
## What not to do
- **Don't drive the loop yourself by reading issues and implementing them inline.** The shell is the loop. If you're tempted to "just do this one in chat," stop and run the script.
- **Don't background the script** so the user can keep chatting with you. The output IS the value. The user wants to watch it work.
- **Don't summarize between iterations.** Chatter belongs in the final summary, not after each commit.
- **Don't tag the user in PR/issue comments** during the run. They're not in the loop until the script exits.
- **Don't restart a failed iteration manually.** The loop's `needs-decision` and `ci-failed` labels are how failures stay in the tracker for human triage. Manual restart skips that.
## How this fits the chain
`/grill-me <feature>` (together) → `/prd` (together) → `/prd-to-issues` (mostly together, file step needs `create`) → `/ship-it` (AFK). The four-skill arc takes a vague feature idea to merged code with one human checkpoint per phase boundary.