Rename eval/ decision log to simulations/; bump pumpingStation pointer
Some checks failed
CI / lint-and-test (push) Has been cancelled
Some checks failed
CI / lint-and-test (push) Has been cancelled
Follows pumpingStation@3e13512 (rename eval/ → simulations/). The decision log file is renamed to match the new folder name; an addendum in the body explains that the rename was a naming clarification, not a rationale change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -1,8 +1,8 @@
|
|||||||
# DECISION-20260422-pumpingstation-eval-harness
|
# DECISION-20260422-pumpingstation-simulations-harness
|
||||||
|
|
||||||
## Context
|
## Context
|
||||||
- Task/request: Provide a way to fluctuate inputs to the pumpingStation and observe the system's response over time, in a readable form suitable for post-hoc analysis (operator review, Grafana, or ad-hoc debugging).
|
- Task/request: Provide a way to fluctuate inputs to the pumpingStation and observe the system's response over time, in a readable form suitable for post-hoc analysis (operator review, Grafana, or ad-hoc debugging).
|
||||||
- Impacted files/contracts: `nodes/pumpingStation/eval/*`, `test/basic/*`.
|
- Impacted files/contracts: `nodes/pumpingStation/simulations/*`, `test/basic/*`.
|
||||||
- Why a decision is required now: Unit tests (`node --test`) verify individual functions in isolation. They can't ergonomically show "what does the level look like over 20 minutes of storm surge". That's a different artefact.
|
- Why a decision is required now: Unit tests (`node --test`) verify individual functions in isolation. They can't ergonomically show "what does the level look like over 20 minutes of storm surge". That's a different artefact.
|
||||||
|
|
||||||
## Options
|
## Options
|
||||||
@@ -10,7 +10,7 @@
|
|||||||
- Benefits: Single testing surface.
|
- Benefits: Single testing surface.
|
||||||
- Risks: Unit tests are assertion-heavy and slow to read; scenario output (tables, events) gets lost in TAP.
|
- Risks: Unit tests are assertion-heavy and slow to read; scenario output (tables, events) gets lost in TAP.
|
||||||
|
|
||||||
2. Separate `eval/` folder with a scenario runner (selected)
|
2. Separate `simulations/` folder with a scenario runner (selected)
|
||||||
- Benefits: Scenarios read as narratives ("steady state", "storm surge", "safety dry-run"); output is human-friendly (ASCII table + events + expectation checks); JSONL per-tick log enables Grafana streaming or offline analysis.
|
- Benefits: Scenarios read as narratives ("steady state", "storm surge", "safety dry-run"); output is human-friendly (ASCII table + events + expectation checks); JSONL per-tick log enables Grafana streaming or offline analysis.
|
||||||
- Risks: Second test surface to maintain.
|
- Risks: Second test surface to maintain.
|
||||||
|
|
||||||
@@ -22,14 +22,18 @@
|
|||||||
- Selected option: Option 2.
|
- Selected option: Option 2.
|
||||||
- Decision owner: User
|
- Decision owner: User
|
||||||
- Date: 2026-04-22
|
- Date: 2026-04-22
|
||||||
- Rationale: Unit tests answer "is this function correct?"; evals answer "how does the system behave under this input profile?". Two distinct questions — two distinct tools. The split also matches the .claude/rules/testing.md 3-tier convention (basic/integration/edge) which is for asserted behaviours, not scenario replay.
|
- Rationale: Unit tests answer "is this function correct?"; scenarios answer "how does the system behave under this input profile?". Two distinct questions — two distinct tools. The split also matches the .claude/rules/testing.md 3-tier convention (basic/integration/edge) which is for asserted behaviours, not scenario replay.
|
||||||
|
|
||||||
|
### Addendum (same-day rename)
|
||||||
|
|
||||||
|
Folder was initially named `eval/`. Renamed to `simulations/` in commit pumpingStation@3e13512 — `eval` and `test` are near-synonyms so the split implied a conceptual difference that doesn't really exist. `simulations/` is more honest about what's happening (scripted plant inputs driving a physics sim, recorded for analysis). Rationale above is unchanged; only the folder name is.
|
||||||
|
|
||||||
## Architecture
|
## Architecture
|
||||||
|
|
||||||
```
|
```
|
||||||
test/
|
test/
|
||||||
basic/ integration/ edge/ — node:test + assertions
|
basic/ integration/ edge/ — node:test + assertions
|
||||||
eval/
|
simulations/
|
||||||
run.js — scenario driver
|
run.js — scenario driver
|
||||||
scenarios/*.js — each exports { name, config, setup, inputs(t,ps), expectations }
|
scenarios/*.js — each exports { name, config, setup, inputs(t,ps), expectations }
|
||||||
formatters/table.js — ASCII summary
|
formatters/table.js — ASCII summary
|
||||||
@@ -42,12 +46,12 @@ Driver monkey-patches `Date.now()` so the volume integrator sees 1 second per ti
|
|||||||
## Consequences
|
## Consequences
|
||||||
- Compatibility impact: None.
|
- Compatibility impact: None.
|
||||||
- Safety/security impact: None — read-only simulation.
|
- Safety/security impact: None — read-only simulation.
|
||||||
- Data/operations impact: Running `node eval/run.js --all` produces artefacts that can be checked into CI for regression (e.g. "did the storm scenario's max level rise compared to last release?"). The JSONL format is friendly to InfluxDB/Grafana for interactive review.
|
- Data/operations impact: Running `node simulations/run.js --all` produces artefacts that can be checked into CI for regression (e.g. "did the storm scenario's max level rise compared to last release?"). The JSONL format is friendly to InfluxDB/Grafana for interactive review.
|
||||||
|
|
||||||
## Implementation Notes
|
## Implementation Notes
|
||||||
- Required code/doc updates: Driver + three starter scenarios (`levelbased-steady`, `levelbased-storm`, `safety-dry-run-trip`) + README in `eval/`.
|
- Required code/doc updates: Driver + three starter scenarios (`levelbased-steady`, `levelbased-storm`, `safety-dry-run-trip`) + README in `simulations/`.
|
||||||
- Validation evidence required: `node eval/run.js --all` exits 0; manual inspection of JSONL confirms per-tick records make physical sense.
|
- Validation evidence required: `node simulations/run.js --all` exits 0; manual inspection of JSONL confirms per-tick records make physical sense.
|
||||||
|
|
||||||
## Rollback / Migration
|
## Rollback / Migration
|
||||||
- Rollback strategy: Delete `eval/`. Unit tests continue to work.
|
- Rollback strategy: Delete `simulations/`. Unit tests continue to work.
|
||||||
- Migration/deprecation plan: N/A.
|
- Migration/deprecation plan: N/A.
|
||||||
Submodule nodes/pumpingStation updated: 66fd3feff8...3e13512a83
Reference in New Issue
Block a user