Rename eval/ → simulations/ and fix log-write bug

Per discussion: "test" and "eval" overlap in meaning; "simulations" is more honest about what's actually happening — scripted plant inputs driving a physics sim, then recorded for analysis. Rename scope: - eval/ → simulations/ (tracked as git renames) - Internal references in run.js and README.md updated - wiki/modes/mpc.md link updated Also fixes a log-write bug noticed during the rename: - run.js didn't mkdir simulations/logs/ before createWriteStream, so the stream opened into a potentially non-existent dir and the file never materialised. Added fs.mkdirSync(..., recursive:true). - end() wasn't awaited, so the process could exit before the stream flushed. Now awaits the 'finish' event. Confirmed: 1200 records actually land in simulations/logs/<scenario>.jsonl. - Added simulations/logs/.gitignore so future JSONL artefacts stay out of the repo but the dir remains tracked. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 17:46:10 +02:00
parent 66fd3feff8
commit 3e13512a83
8 changed files with 24 additions and 19 deletions
--- a/simulations/README.md
+++ b/simulations/README.md
@@ -0,0 +1,123 @@
+# Evaluation harness
+
+Scenario-based evaluation for pumpingStation. Each scenario scripts a stream of inputs against a configured station, ticks the simulator at 1 s resolution, records every state, and prints a summary + event log + expectation check. Separate from unit tests (`test/`) — those verify individual pieces of logic in isolation; scenarios check end-to-end behaviour over time with realistic input trajectories.
+
+## Run
+
+```bash
+# One scenario
+node simulations/run.js levelbased-steady
+
+# All scenarios at once
+node simulations/run.js --all
+```
+
+Per-tick records are written to `simulations/logs/<scenario>.jsonl` for post-hoc analysis (e.g. streaming into InfluxDB for Grafana, or pandas / jq for one-off exploration).
+
+## Scenario file shape
+
+```js
+// simulations/scenarios/<name>.js
+module.exports = {
+  name: 'scenario-identifier',
+  description: 'one sentence — what the scenario is testing',
+  durationSec: 1200,
+
+  config: { /* PumpingStation config, same shape as nodeClass builds */ },
+
+  setup: async (ps) => {
+    // Optional. Wire fake MGCs, calibrate initial level, etc.
+  },
+
+  inputs: (t, ps) => {
+    // Called every tick (t in seconds). Drive inflow, mode changes,
+    // operator actions, etc.
+    ps.setManualInflow(0.005, Date.now(), 'm3/s');
+  },
+
+  expectations: [
+    { name: 'no safety trips', type: 'safety_trips_eq', value: 0 },
+    { name: 'level stays below overflow', type: 'max_level_bounded', value: 4.5 },
+  ],
+};
+```
+
+## Supported expectation types
+
+| Type | Semantics |
+|---|---|
+| `max_level_bounded` | max level across the run must be `≤ value` |
+| `min_level_bounded` | min level across the run must be `≥ value` |
+| `max_demand_bounded` | max percControl must be `≤ value` |
+| `safety_trips_eq` | total ticks with `safetyActive` must equal `value` |
+| `safety_trips_gt` | total ticks with `safetyActive` must be `> value` |
+| `end_state_eq` | final record's `field` must equal `value` |
+| `threshold_issues_eq` | startup guardrail issue count must equal `value` |
+
+Add new expectation types in `run.js` (`evalExpectation`).
+
+## Output
+
+Example run:
+
+```
+═══ Scenario: levelbased-steady ═══
+Constant sewer inflow below pump capacity; level converges inside the RAMP zone with demand matching inflow.
+Duration: 1200s, 1s ticks
+
+─── Samples (every 10%) ───
+  t(s)    level(m)   vol(m3)    dir         netFlow(m3/s)   src             demand    safe
+  ────────────────────────────────────────────────────────────────────────────────────────
+       0       2.00     20.00  steady       0               —                 0%       ·
+     120       2.64     26.40  draining    -0.0026          predicted        62%       ·
+     240       2.30     23.00  draining    -0.0004          predicted        68%       ·
+     ...
+
+─── Events (3) ───
+  t=  15s  direction  steady → filling
+  t= 134s  direction  filling → draining
+
+─── Metrics ───
+  level       min=2.00  max=2.73  end=2.33 m
+  percControl min=0%    max=73%   end=66%
+  safety      trips=0 ticks
+  threshold   issues=0 at startup
+
+─── Expectations ───
+  ✓ no safety trips: 0 ticks with safetyActive (expected 0)
+  ✓ level stays below overflow: max level = 2.73 m (bound: ≤ 4.5)
+  ✓ level stays above outflow: min level = 2.00 m (bound: ≥ 0.2)
+  ✓ no threshold issues on init: 0 threshold issues at startup (expected 0)
+
+Log: simulations/logs/levelbased-steady.jsonl (1200 records)
+✅ PASS
+```
+
+## Why separate from `test/`?
+
+| | `test/` | `simulations/` |
+|---|---|---|
+| runner | `node --test` | `node simulations/run.js` |
+| scope | one function / small behaviour | end-to-end scenario over time |
+| duration | milliseconds | seconds to minutes (simulated) |
+| assertion style | tight, exact (`assert.equal`) | tolerance / bounds / event counts |
+| output | TAP | summary table + JSONL for analysis |
+| purpose | catch regressions | analyse how the system responds to input |
+
+Unit tests live under `test/basic/`, `test/integration/`, `test/edge/`. Scenarios live here under `simulations/scenarios/`.
+
+## Sending logs to Grafana (optional)
+
+The JSONL output has one record per tick. To stream into InfluxDB for Grafana viewing, adapt a small consumer:
+
+```bash
+jq -c '{
+  measurement: "pumping_station_eval",
+  tags: { scenario: "'$SCENARIO'" },
+  fields: { level: .level, volume: .volume, demand: .percControl, safety: (.safetyActive|if . then 1 else 0 end) },
+  timestamp: (.t | tonumber | . * 1000000000)
+}' simulations/logs/$SCENARIO.jsonl \
+  | influx write --bucket=telemetry ...
+```
+
+The `t` field is seconds from the scenario start (not wall-clock), so point the Grafana time range at `now() - $duration` after running.