Rename eval/ → simulations/ and fix log-write bug

Per discussion: "test" and "eval" overlap in meaning; "simulations"
is more honest about what's actually happening — scripted plant
inputs driving a physics sim, then recorded for analysis.

Rename scope:
- eval/ → simulations/ (tracked as git renames)
- Internal references in run.js and README.md updated
- wiki/modes/mpc.md link updated

Also fixes a log-write bug noticed during the rename:
- run.js didn't mkdir simulations/logs/ before createWriteStream,
  so the stream opened into a potentially non-existent dir and the
  file never materialised. Added fs.mkdirSync(..., recursive:true).
- end() wasn't awaited, so the process could exit before the stream
  flushed. Now awaits the 'finish' event. Confirmed: 1200 records
  actually land in simulations/logs/<scenario>.jsonl.
- Added simulations/logs/.gitignore so future JSONL artefacts stay
  out of the repo but the dir remains tracked.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
znetsixe
2026-04-22 17:46:10 +02:00
parent 66fd3feff8
commit 3e13512a83
8 changed files with 24 additions and 19 deletions

View File

@@ -6,18 +6,18 @@ Scenario-based evaluation for pumpingStation. Each scenario scripts a stream of
```bash
# One scenario
node eval/run.js levelbased-steady
node simulations/run.js levelbased-steady
# All scenarios at once
node eval/run.js --all
node simulations/run.js --all
```
Per-tick records are written to `eval/logs/<scenario>.jsonl` for post-hoc analysis (e.g. streaming into InfluxDB for Grafana, or pandas / jq for one-off exploration).
Per-tick records are written to `simulations/logs/<scenario>.jsonl` for post-hoc analysis (e.g. streaming into InfluxDB for Grafana, or pandas / jq for one-off exploration).
## Scenario file shape
```js
// eval/scenarios/<name>.js
// simulations/scenarios/<name>.js
module.exports = {
name: 'scenario-identifier',
description: 'one sentence — what the scenario is testing',
@@ -89,22 +89,22 @@ Duration: 1200s, 1s ticks
✓ level stays above outflow: min level = 2.00 m (bound: ≥ 0.2)
✓ no threshold issues on init: 0 threshold issues at startup (expected 0)
Log: eval/logs/levelbased-steady.jsonl (1200 records)
Log: simulations/logs/levelbased-steady.jsonl (1200 records)
✅ PASS
```
## Why separate from `test/`?
| | `test/` | `eval/` |
| | `test/` | `simulations/` |
|---|---|---|
| runner | `node --test` | `node eval/run.js` |
| runner | `node --test` | `node simulations/run.js` |
| scope | one function / small behaviour | end-to-end scenario over time |
| duration | milliseconds | seconds to minutes (simulated) |
| assertion style | tight, exact (`assert.equal`) | tolerance / bounds / event counts |
| output | TAP | summary table + JSONL for analysis |
| purpose | catch regressions | analyse how the system responds to input |
Unit tests live under `test/basic/`, `test/integration/`, `test/edge/`. Scenarios live here under `eval/scenarios/`.
Unit tests live under `test/basic/`, `test/integration/`, `test/edge/`. Scenarios live here under `simulations/scenarios/`.
## Sending logs to Grafana (optional)
@@ -116,7 +116,7 @@ jq -c '{
tags: { scenario: "'$SCENARIO'" },
fields: { level: .level, volume: .volume, demand: .percControl, safety: (.safetyActive|if . then 1 else 0 end) },
timestamp: (.t | tonumber | . * 1000000000)
}' eval/logs/$SCENARIO.jsonl \
}' simulations/logs/$SCENARIO.jsonl \
| influx write --bucket=telemetry ...
```

View File

@@ -1,5 +1,5 @@
// ASCII table summary of scenario samples.
// Used by eval/run.js.
// Used by simulations/run.js.
function pad(s, n, left = false) {
s = String(s ?? '');

2
simulations/logs/.gitignore vendored Normal file
View File

@@ -0,0 +1,2 @@
*.jsonl
!.gitignore

View File

@@ -1,14 +1,14 @@
#!/usr/bin/env node
// Scenario runner for pumpingStation. Usage:
//
// node eval/run.js <scenario> # run one
// node eval/run.js --all # run all scenarios
// node simulations/run.js <scenario> # run one
// node simulations/run.js --all # run all scenarios
//
// Each scenario lives in eval/scenarios/<name>.js and exports:
// Each scenario lives in simulations/scenarios/<name>.js and exports:
// { name, description, durationSec, config, setup?, inputs, expectations? }
//
// The runner ticks the station once per simulated second, records every
// state into eval/logs/<name>.jsonl, prints a summary table + event log,
// state into simulations/logs/<name>.jsonl, prints a summary table + event log,
// and checks expectations.
const path = require('path');
@@ -102,7 +102,9 @@ async function runScenario(name) {
if (scenario.setup) await scenario.setup(ps);
const duration = scenario.durationSec ?? 600;
const logPath = path.join(__dirname, 'logs', `${scenario.name}.jsonl`);
const logDir = path.join(__dirname, 'logs');
fs.mkdirSync(logDir, { recursive: true });
const logPath = path.join(logDir, `${scenario.name}.jsonl`);
const log = fs.createWriteStream(logPath);
const records = [];
@@ -115,7 +117,8 @@ async function runScenario(name) {
records.push(snap);
log.write(JSON.stringify(snap) + '\n');
}
log.end();
// Drain so the file is fully written before we return.
await new Promise((resolve, reject) => { log.end(); log.on('finish', resolve); log.on('error', reject); });
return { ps, records, scenario, duration, logPath };
} finally {
@@ -174,7 +177,7 @@ async function runAndReport(name) {
async function main() {
const arg = process.argv[2];
if (!arg) {
console.error('Usage: node eval/run.js <scenario> | --all');
console.error('Usage: node simulations/run.js <scenario> | --all');
console.error('Available:', fs.readdirSync(path.join(__dirname, 'scenarios')).map((f) => f.replace(/\.js$/, '')).join(', '));
process.exit(1);
}

View File

@@ -78,7 +78,7 @@ Blocks:
## Diagram 2 — scenario time-series
A much more useful way to evaluate MPC is to plot *what it did* over a simulated scenario: level, planned vs actual demand, the cost function breakdown, the active constraints. The [eval harness](../../eval/README.md) is built for exactly this — MPC will need a dedicated scenario like `mpc-storm-with-forecast.js`.
A much more useful way to evaluate MPC is to plot *what it did* over a simulated scenario: level, planned vs actual demand, the cost function breakdown, the active constraints. The [simulations harness](../../simulations/README.md) is built for exactly this — MPC will need a dedicated scenario like `mpc-storm-with-forecast.js`.
```
Placeholder — replace with:
@@ -146,4 +146,4 @@ demand = plan.command[0]
- [Functional description](../functional-description.md) — basin model + safety layer
- [modes/levelbased.md](levelbased.md) — Tier 1 — the "default" MPC falls back to
- [modes/powerbased.md](powerbased.md) — Tier 2 — MPC generalises the clip idea into full optimisation
- [eval/README.md](../../eval/README.md) — where MPC evaluation scenarios will live
- [simulations/README.md](../../simulations/README.md) — where MPC simulation scenarios will live