Rename eval/ → simulations/ and fix log-write bug
Per discussion: "test" and "eval" overlap in meaning; "simulations" is more honest about what's actually happening — scripted plant inputs driving a physics sim, then recorded for analysis. Rename scope: - eval/ → simulations/ (tracked as git renames) - Internal references in run.js and README.md updated - wiki/modes/mpc.md link updated Also fixes a log-write bug noticed during the rename: - run.js didn't mkdir simulations/logs/ before createWriteStream, so the stream opened into a potentially non-existent dir and the file never materialised. Added fs.mkdirSync(..., recursive:true). - end() wasn't awaited, so the process could exit before the stream flushed. Now awaits the 'finish' event. Confirmed: 1200 records actually land in simulations/logs/<scenario>.jsonl. - Added simulations/logs/.gitignore so future JSONL artefacts stay out of the repo but the dir remains tracked. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
123
simulations/README.md
Normal file
123
simulations/README.md
Normal file
@@ -0,0 +1,123 @@
|
||||
# Evaluation harness
|
||||
|
||||
Scenario-based evaluation for pumpingStation. Each scenario scripts a stream of inputs against a configured station, ticks the simulator at 1 s resolution, records every state, and prints a summary + event log + expectation check. Separate from unit tests (`test/`) — those verify individual pieces of logic in isolation; scenarios check end-to-end behaviour over time with realistic input trajectories.
|
||||
|
||||
## Run
|
||||
|
||||
```bash
|
||||
# One scenario
|
||||
node simulations/run.js levelbased-steady
|
||||
|
||||
# All scenarios at once
|
||||
node simulations/run.js --all
|
||||
```
|
||||
|
||||
Per-tick records are written to `simulations/logs/<scenario>.jsonl` for post-hoc analysis (e.g. streaming into InfluxDB for Grafana, or pandas / jq for one-off exploration).
|
||||
|
||||
## Scenario file shape
|
||||
|
||||
```js
|
||||
// simulations/scenarios/<name>.js
|
||||
module.exports = {
|
||||
name: 'scenario-identifier',
|
||||
description: 'one sentence — what the scenario is testing',
|
||||
durationSec: 1200,
|
||||
|
||||
config: { /* PumpingStation config, same shape as nodeClass builds */ },
|
||||
|
||||
setup: async (ps) => {
|
||||
// Optional. Wire fake MGCs, calibrate initial level, etc.
|
||||
},
|
||||
|
||||
inputs: (t, ps) => {
|
||||
// Called every tick (t in seconds). Drive inflow, mode changes,
|
||||
// operator actions, etc.
|
||||
ps.setManualInflow(0.005, Date.now(), 'm3/s');
|
||||
},
|
||||
|
||||
expectations: [
|
||||
{ name: 'no safety trips', type: 'safety_trips_eq', value: 0 },
|
||||
{ name: 'level stays below overflow', type: 'max_level_bounded', value: 4.5 },
|
||||
],
|
||||
};
|
||||
```
|
||||
|
||||
## Supported expectation types
|
||||
|
||||
| Type | Semantics |
|
||||
|---|---|
|
||||
| `max_level_bounded` | max level across the run must be `≤ value` |
|
||||
| `min_level_bounded` | min level across the run must be `≥ value` |
|
||||
| `max_demand_bounded` | max percControl must be `≤ value` |
|
||||
| `safety_trips_eq` | total ticks with `safetyActive` must equal `value` |
|
||||
| `safety_trips_gt` | total ticks with `safetyActive` must be `> value` |
|
||||
| `end_state_eq` | final record's `field` must equal `value` |
|
||||
| `threshold_issues_eq` | startup guardrail issue count must equal `value` |
|
||||
|
||||
Add new expectation types in `run.js` (`evalExpectation`).
|
||||
|
||||
## Output
|
||||
|
||||
Example run:
|
||||
|
||||
```
|
||||
═══ Scenario: levelbased-steady ═══
|
||||
Constant sewer inflow below pump capacity; level converges inside the RAMP zone with demand matching inflow.
|
||||
Duration: 1200s, 1s ticks
|
||||
|
||||
─── Samples (every 10%) ───
|
||||
t(s) level(m) vol(m3) dir netFlow(m3/s) src demand safe
|
||||
────────────────────────────────────────────────────────────────────────────────────────
|
||||
0 2.00 20.00 steady 0 — 0% ·
|
||||
120 2.64 26.40 draining -0.0026 predicted 62% ·
|
||||
240 2.30 23.00 draining -0.0004 predicted 68% ·
|
||||
...
|
||||
|
||||
─── Events (3) ───
|
||||
t= 15s direction steady → filling
|
||||
t= 134s direction filling → draining
|
||||
|
||||
─── Metrics ───
|
||||
level min=2.00 max=2.73 end=2.33 m
|
||||
percControl min=0% max=73% end=66%
|
||||
safety trips=0 ticks
|
||||
threshold issues=0 at startup
|
||||
|
||||
─── Expectations ───
|
||||
✓ no safety trips: 0 ticks with safetyActive (expected 0)
|
||||
✓ level stays below overflow: max level = 2.73 m (bound: ≤ 4.5)
|
||||
✓ level stays above outflow: min level = 2.00 m (bound: ≥ 0.2)
|
||||
✓ no threshold issues on init: 0 threshold issues at startup (expected 0)
|
||||
|
||||
Log: simulations/logs/levelbased-steady.jsonl (1200 records)
|
||||
✅ PASS
|
||||
```
|
||||
|
||||
## Why separate from `test/`?
|
||||
|
||||
| | `test/` | `simulations/` |
|
||||
|---|---|---|
|
||||
| runner | `node --test` | `node simulations/run.js` |
|
||||
| scope | one function / small behaviour | end-to-end scenario over time |
|
||||
| duration | milliseconds | seconds to minutes (simulated) |
|
||||
| assertion style | tight, exact (`assert.equal`) | tolerance / bounds / event counts |
|
||||
| output | TAP | summary table + JSONL for analysis |
|
||||
| purpose | catch regressions | analyse how the system responds to input |
|
||||
|
||||
Unit tests live under `test/basic/`, `test/integration/`, `test/edge/`. Scenarios live here under `simulations/scenarios/`.
|
||||
|
||||
## Sending logs to Grafana (optional)
|
||||
|
||||
The JSONL output has one record per tick. To stream into InfluxDB for Grafana viewing, adapt a small consumer:
|
||||
|
||||
```bash
|
||||
jq -c '{
|
||||
measurement: "pumping_station_eval",
|
||||
tags: { scenario: "'$SCENARIO'" },
|
||||
fields: { level: .level, volume: .volume, demand: .percControl, safety: (.safetyActive|if . then 1 else 0 end) },
|
||||
timestamp: (.t | tonumber | . * 1000000000)
|
||||
}' simulations/logs/$SCENARIO.jsonl \
|
||||
| influx write --bucket=telemetry ...
|
||||
```
|
||||
|
||||
The `t` field is seconds from the scenario start (not wall-clock), so point the Grafana time range at `now() - $duration` after running.
|
||||
Reference in New Issue
Block a user