wiki/Archive/architecture-stack-review.md

---
title: EVOLV Architecture Review
created: 2026-03-01
updated: 2026-04-07
status: evolving
tags: [architecture, stack, review]
---

> **⚠️ ARCHIVED — pre-refactor (Tier 1–4, 2026-05)**
>
> This page describes the architecture before the platform refactor.
> The current page is the per-node wiki on **[gitea.wbd-rd.nl/RnD](https://gitea.wbd-rd.nl/RnD)** or **[Home](../Home)**.
>
> Kept for historical reference only. **Do not update.**


# EVOLV Architecture Review

## Purpose

This document captures:

- the architecture implemented in this repository today
- the broader edge/site/central architecture shown in the drawings under `temp/`
- the key strengths and weaknesses of that direction
- the currently preferred target stack based on owner decisions from this review

It is the local staging document for a later wiki update.

## Evidence Used

Implemented stack evidence:

- `docker-compose.yml`
- `docker/settings.js`
- `docker/grafana/provisioning/datasources/influxdb.yaml`
- `package.json`
- `nodes/*`

Target-state evidence:

- `temp/fullStack.pdf`
- `temp/edge.pdf`
- `temp/CoreSync.drawio.pdf`
- `temp/cloud.yml`

Owner decisions from this review:

- local InfluxDB is required for operational resilience
- central acts as the advisory/intelligence and API-entry layer, not as a direct field caller
- intended configuration authority is the database-backed `tagcodering` model
- architecture wiki pages should be visual, not text-only

## 1. What Exists Today

### 1.1 Product/runtime layer

The codebase is currently a modular Node-RED package for wastewater/process automation:

- EVOLV ships custom Node-RED nodes for plant assets and process logic
- nodes emit both process/control messages and telemetry-oriented outputs
- shared helper logic lives in `nodes/generalFunctions/`
- Grafana-facing integration exists through `dashboardAPI` and Influx-oriented outputs

### 1.2 Implemented development stack

The concrete development stack in this repository is:

- Node-RED
- InfluxDB 2.x
- Grafana

That gives a clear local flow:

1. EVOLV logic runs in Node-RED.
2. Telemetry is emitted in a time-series-oriented shape.
3. InfluxDB stores the telemetry.
4. Grafana renders operational dashboards.

### 1.3 Existing runtime pattern in the nodes

A recurring EVOLV pattern is:

- output 0: process/control message
- output 1: Influx/telemetry message
- output 2: registration/control plumbing where relevant

So even in its current implemented form, EVOLV is not only a Node-RED project. It is already a control-plus-observability platform, with Node-RED as orchestration/runtime and InfluxDB/Grafana as telemetry and visualization services.

## 2. What The Drawings Describe

Across `temp/fullStack.pdf` and `temp/CoreSync.drawio.pdf`, the intended platform is broader and layered.

### 2.1 Edge / OT layer

The drawings consistently place these capabilities at the edge:

- PLC / OPC UA connectivity
- Node-RED container as protocol translator and logic runtime
- local broker in some variants
- local InfluxDB / Prometheus style storage in some variants
- local Grafana/SCADA in some variants

This is the plant-side operational layer.

### 2.2 Site / local server layer

The CoreSync drawings also show a site aggregation layer:

- RWZI-local server
- Node-RED / CoreSync services
- site-local broker
- site-local database
- upward API-based synchronization

This layer decouples field assets from central services and absorbs plant-specific complexity.

### 2.3 Central / cloud layer

The broader stack drawings and `temp/cloud.yml` show a central platform layer with:

- Gitea
- Jenkins
- reverse proxy / ingress
- Grafana
- InfluxDB
- Node-RED
- RabbitMQ / messaging
- VPN / tunnel concepts
- Keycloak in the drawing
- Portainer in the drawing

This is a platform-services layer, not just an application runtime.

## 3. Architecture Decisions From This Review

These decisions now shape the preferred EVOLV target architecture.

### 3.1 Local telemetry is mandatory for resilience

Local InfluxDB is not optional. It is required so that:

- operations continue when central SCADA or central services are down
- local dashboards and advanced digital-twin workflows can still consume recent and relevant process history
- local edge/site layers can make smarter decisions without depending on round-trips to central

### 3.2 Multi-level InfluxDB is part of the architecture

InfluxDB should exist on multiple levels where it adds operational value:

- edge/local for resilience and near-real-time replay
- site for plant-level history, diagnostics, and resilience
- central for fleet-wide analytics, benchmarking, and advisory intelligence

This is not just copy-paste storage at each level. The design intent is event-driven and selective.

### 3.3 Storage should be smart, not only deadband-driven

The target is not simple "store every point" or only a fixed deadband rule such as 1%.

The desired storage approach is:

- observe signal slope and change behavior
- preserve points where state is changing materially
- store fewer points where the signal can be reconstructed downstream with sufficient fidelity
- carry enough metadata or conventions so reconstruction quality is auditable

This implies EVOLV should evolve toward smart storage and signal-aware retention rather than naive event dumping.

### 3.4 Central is the intelligence and API-entry layer

Central may advise and coordinate edge/site layers, but external API requests should not hit field-edge systems directly.

The intended pattern is:

- external and enterprise integrations terminate centrally
- central evaluates, aggregates, authorizes, and advises
- site/edge layers receive mediated requests, policies, or setpoints
- field-edge remains protected behind an intermediate layer

This aligns with the stated security direction.

### 3.5 Configuration source of truth should be database-backed

The intended configuration authority is the database-backed `tagcodering` model, which already exists but is not yet complete enough to serve as the fully realized source of truth.

That means the architecture should assume:

- asset and machine metadata belong in `tagcodering`
- Node-RED flows should consume configuration rather than silently becoming the only configuration store
- more work is still needed before this behaves as the intended central configuration backbone

## 4. Visual Model

### 4.1 Platform topology

```mermaid
flowchart LR
    subgraph OT["OT / Field"]
        PLC["PLC / IO"]
        DEV["Sensors / Machines"]
    end

    subgraph EDGE["Edge Layer"]
        ENR["Edge Node-RED"]
        EDB["Local InfluxDB"]
        EUI["Local Grafana / Local Monitoring"]
        EBR["Optional Local Broker"]
    end

    subgraph SITE["Site Layer"]
        SNR["Site Node-RED / CoreSync"]
        SDB["Site InfluxDB"]
        SUI["Site Grafana / SCADA Support"]
        SBR["Site Broker"]
    end

    subgraph CENTRAL["Central Layer"]
        API["API / Integration Gateway"]
        INTEL["Overview Intelligence / Advisory Logic"]
        CDB["Central InfluxDB"]
        CGR["Central Grafana"]
        CFG["Tagcodering Config Model"]
        GIT["Gitea"]
        CI["CI/CD"]
        IAM["IAM / Keycloak"]
    end

    DEV --> PLC
    PLC --> ENR
    ENR --> EDB
    ENR --> EUI
    ENR --> EBR
    ENR <--> SNR
    EDB <--> SDB
    SNR --> SDB
    SNR --> SUI
    SNR --> SBR
    SNR <--> API
    API --> INTEL
    API <--> CFG
    SDB <--> CDB
    INTEL --> SNR
    CGR --> CDB
    CI --> GIT
    IAM --> API
    IAM --> CGR
```

### 4.2 Command and access boundary

```mermaid
flowchart TD
    EXT["External APIs / Enterprise Requests"] --> API["Central API Gateway"]
    API --> AUTH["AuthN/AuthZ / Policy Checks"]
    AUTH --> INTEL["Central Advisory / Decision Support"]
    INTEL --> SITE["Site Integration Layer"]
    SITE --> EDGE["Edge Runtime"]
    EDGE --> PLC["PLC / Field Assets"]

    EXT -. no direct access .-> EDGE
    EXT -. no direct access .-> PLC
```

### 4.3 Smart telemetry flow

```mermaid
flowchart LR
    RAW["Raw Signal"] --> EDGELOGIC["Edge Signal Evaluation"]
    EDGELOGIC --> KEEP["Keep Critical Change Points"]
    EDGELOGIC --> SKIP["Skip Reconstructable Flat Points"]
    EDGELOGIC --> LOCAL["Local InfluxDB"]
    LOCAL --> SITE["Site InfluxDB"]
    SITE --> CENTRAL["Central InfluxDB"]
    KEEP --> LOCAL
    SKIP -. reconstruction assumptions / metadata .-> SITE
    CENTRAL --> DASH["Fleet Dashboards / Analytics"]
```

## 5. Upsides Of This Direction

### 5.1 Strong separation between control and observability

Node-RED for runtime/orchestration and InfluxDB/Grafana for telemetry is still the right structural split:

- control stays close to the process
- telemetry storage/querying stays in time-series-native tooling
- dashboards do not need to overload Node-RED itself

### 5.2 Edge-first matches operational reality

For wastewater/process systems, edge-first remains correct:

- lower latency
- better degraded-mode behavior
- less dependence on WAN or central platform uptime
- clearer OT trust boundary

### 5.3 Site mediation improves safety and security

Using central as the enterprise/API entry point and site as the mediator improves posture:

- field systems are less exposed
- policy decisions can be centralized
- external integrations do not probe the edge directly
- site can continue operating even when upstream is degraded

### 5.4 Multi-level storage enables better analytics

Multiple Influx layers can support:

- local resilience
- site diagnostics
- fleet benchmarking
- smarter retention and reconstruction strategies

That is substantially more capable than a single central historian model.

### 5.5 `tagcodering` is the right long-term direction

A database-backed configuration authority is stronger than embedding configuration only in flows because it supports:

- machine metadata management
- controlled rollout of configuration changes
- clearer versioning and provenance
- future API-driven configuration services

## 6. Downsides And Risks

### 6.1 Smart storage raises algorithmic and governance complexity

Signal-aware storage and reconstruction is promising, but it creates architectural obligations:

- reconstruction rules must be explicit
- acceptable reconstruction error must be defined per signal type
- operators must know whether they see raw or reconstructed history
- compliance-relevant data may need stricter retention than operational convenience data

Without those rules, smart storage can become opaque and hard to trust.

### 6.2 Multi-level databases can create ownership confusion

If edge, site, and central all store telemetry, you must define:

- which layer is authoritative for which time horizon
- when backfill is allowed
- when data is summarized vs copied
- how duplicates or gaps are detected

Otherwise operations will argue over which trend is "the real one."

### 6.3 Central intelligence must remain advisory-first

Central guidance can become valuable, but direct closed-loop dependency on central would be risky.

The architecture should therefore preserve:

- local control authority at edge/site
- bounded and explicit central advice
- safe behavior if central recommendations stop arriving

### 6.4 `tagcodering` is not yet complete enough to lean on blindly

It is the right target, but its current partial state means there is still architecture debt:

- incomplete config workflows
- likely mismatch between desired and implemented schema behavior
- temporary duplication between flows, node config, and database-held metadata

This should be treated as a core platform workstream, not a side issue.

### 6.5 Broker responsibilities are still not crisp enough

The materials still reference MQTT/AMQP/RabbitMQ/brokers without one stable responsibility split. That needs to be resolved before large-scale deployment.

Questions still open:

- command bus or event bus?
- site-only or cross-site?
- telemetry transport or only synchronization/eventing?
- durability expectations and replay behavior?

## 7. Security And Regulatory Positioning

### 7.1 Purdue-style layering is a good fit

EVOLV's preferred structure aligns well with a Purdue-style OT/IT layering approach:

- PLCs and field assets stay at the operational edge
- edge runtimes stay close to the process
- site systems mediate between OT and broader enterprise concerns
- central services host APIs, identity, analytics, and engineering workflows

That is important because it supports segmented trust boundaries instead of direct enterprise-to-field reach-through.

### 7.2 NIS2 alignment

Directive (EU) 2022/2555 (NIS2) requires cybersecurity risk-management measures, incident handling, and stronger governance for covered entities.

This architecture supports that by:

- limiting direct exposure of field systems
- separating operational layers
- enabling central policy and oversight
- preserving local operation during upstream failure

### 7.3 CER alignment

Directive (EU) 2022/2557 (Critical Entities Resilience Directive) focuses on resilience of essential services.

The edge-plus-site approach supports that direction because:

- local/site layers can continue during central disruption
- essential service continuity does not depend on one central runtime
- degraded-mode behavior can be explicitly designed per layer

### 7.4 Cyber Resilience Act alignment

Regulation (EU) 2024/2847 (Cyber Resilience Act) creates cybersecurity requirements for products with digital elements.

For EVOLV, that means the platform should keep strengthening:

- secure configuration handling
- vulnerability and update management
- release traceability
- lifecycle ownership of components and dependencies

### 7.5 GDPR alignment where personal data is present

Regulation (EU) 2016/679 (GDPR) applies whenever EVOLV processes personal data.

The architecture helps by:

- centralizing ingress
- reducing unnecessary propagation of data to field layers
- making access, retention, and audit boundaries easier to define

### 7.6 What can and cannot be claimed

The defensible claim is that EVOLV can be deployed in a way that supports compliance with strict European cybersecurity and resilience expectations.

The non-defensible claim is that EVOLV is automatically compliant purely because of the architecture diagram.

Actual compliance still depends on implementation and operations, including:

- access control
- patch and vulnerability management
- incident response
- logging and audit evidence
- retention policy
- data classification

## 8. Recommended Ideal Stack

The ideal EVOLV stack should be layered around operational boundaries, not around tools.

### 7.1 Layer A: Edge execution

Purpose:

- connect to PLCs and field assets
- execute time-sensitive local logic
- preserve operation during WAN/central loss
- provide local telemetry access for resilience and digital-twin use cases

Recommended components:

- Node-RED runtime for EVOLV edge flows
- OPC UA and protocol adapters
- local InfluxDB
- optional local Grafana for local engineering/monitoring
- optional local broker only when multiple participants need decoupling

Principle:

- edge remains safe and useful when disconnected

### 7.2 Layer B: Site integration

Purpose:

- aggregate multiple edge systems at plant/site level
- host plant-local dashboards and diagnostics
- mediate between raw OT detail and central standardization
- serve as the protected step between field systems and central requests

Recommended components:

- site Node-RED / CoreSync services
- site InfluxDB
- site Grafana / SCADA-supporting dashboards
- site broker where asynchronous eventing is justified

Principle:

- site absorbs plant complexity and protects field assets

### 7.3 Layer C: Central platform

Purpose:

- fleet-wide analytics
- shared dashboards
- engineering lifecycle
- enterprise/API entry point
- overview intelligence and advisory logic

Recommended components:

- Gitea
- CI/CD
- central InfluxDB
- central Grafana
- API/integration gateway
- IAM
- VPN/private connectivity
- `tagcodering`-backed configuration services

Principle:

- central coordinates, advises, and governs; it is not the direct field caller

### 7.4 Cross-cutting platform services

These should be explicit architecture elements:

- secrets management
- certificate management
- backup/restore
- audit logging
- monitoring/alerting of the platform itself
- versioned configuration and schema management
- rollout/rollback strategy

## 9. Recommended Opinionated Choices

### 8.1 Keep Node-RED as the orchestration layer, not the whole platform

Node-RED should own:

- process orchestration
- protocol mediation
- edge/site logic
- KPI production

It should not become the sole owner of:

- identity
- long-term configuration authority
- secret management
- compliance/audit authority

### 8.2 Use InfluxDB by function and horizon

Recommended split:

- edge: resilience, local replay, digital-twin input
- site: plant diagnostics and local continuity
- central: fleet analytics, advisory intelligence, benchmarking, and long-term cross-site views

### 8.3 Prefer smart telemetry retention over naive point dumping

Recommended rule:

- keep information-rich points
- reduce information-poor flat spans
- document reconstruction assumptions
- define signal-class-specific fidelity expectations

This needs design discipline, but it is a real differentiator if executed well.

### 8.4 Put enterprise/API ingress at central, not at edge

This should become a hard architectural rule:

- external requests land centrally
- central authenticates and authorizes
- central or site mediates downward
- edge never becomes the exposed public integration surface

### 8.5 Make `tagcodering` the target configuration backbone

The architecture should be designed so that `tagcodering` can mature into:

- machine and asset registry
- configuration source of truth
- site/central configuration exchange point
- API-served configuration source for runtime layers

## 10. Suggested Phasing

### Phase 1: Stabilize contracts

- define topic and payload contracts
- define telemetry classes and reconstruction policy
- define asset, machine, and site identity model
- define `tagcodering` scope and schema ownership

### Phase 2: Harden local/site resilience

- formalize edge and site runtime patterns
- define local telemetry retention and replay behavior
- define central-loss behavior
- define dashboard behavior during isolation

### Phase 3: Harden central platform

- IAM
- API gateway
- central observability
- CI/CD
- backup and disaster recovery
- config services over `tagcodering`

### Phase 4: Introduce selective synchronization and intelligence

- event-driven telemetry propagation rules
- smart-storage promotion/backfill policies
- advisory services from central
- auditability of downward recommendations and configuration changes

## 11. Immediate Open Questions Before Wiki Finalization

1. Which signals are allowed to use reconstruction-aware smart storage, and which must remain raw or near-raw for audit/compliance reasons?
2. How should `tagcodering` be exposed to runtime layers: direct database access, a dedicated API, or both?
3. What exact responsibility split should EVOLV use between API synchronization and broker-based eventing?

## 12. Recommended Wiki Structure

The wiki should not be one long page. It should be split into:

1. platform overview with the main topology diagram
2. edge-site-central runtime model
3. telemetry and smart storage model
4. security and access-boundary model
5. configuration architecture centered on `tagcodering`

## 13. Next Step

Use this document as the architecture baseline. The companion markdown page in `architecture/` can then be shaped into a wiki-ready visual overview page with Mermaid diagrams and shorter human-readable sections.
-												docs: consolidate scattered documentation into wiki

Move architecture/, docs/ content into wiki/ for a single source of truth:
- architecture/deployment-blueprint.md → wiki/architecture/
- architecture/stack-architecture-review.md → wiki/architecture/
- architecture/wiki-platform-overview.md → wiki/architecture/
- docs/ARCHITECTURE.md → wiki/architecture/node-architecture.md
- docs/API_REFERENCE.md → wiki/concepts/generalfunctions-api.md
- docs/ISSUES.md → wiki/findings/open-issues-2026-03.md

Remove stale files:
- FUNCTIONAL_ISSUES_BACKLOG.md (was just a redirect pointer)
- temp/ (stale cloud env examples)

Fix README.md gitea URL (centraal.wbd-rd.nl → wbd-rd.nl).
Update wiki index with all consolidated pages.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

											
										
										
											2026-04-07 17:08:35 +02:00
+								---
 								title: EVOLV Architecture Review
 								created: 2026-03-01
 								updated: 2026-04-07
 								status: evolving
 								tags: [architecture, stack, review]
 								---
-												wiki: audit + archive stale pages; refresh Home for 2026-05-11 wave

- Archived 20 pre-refactor pages to wiki/Archive/ with standard banners:
  - All 6 architecture/ pages (old _loadConfig/_setupSpecificClass internals,
    pre-refactor S88 hierarchy, deployment blueprint)
  - All 3 sessions/ logs (Apr-07 + Apr-13 session summaries)
  - findings/open-issues-2026-03.md (issues 1-5 all resolved by refactor)
  - concepts/generalfunctions-api.md (missing BaseDomain/BaseNodeAdapter)
  - concepts/sources-readme.md (empty PDF placeholder, never populated)
  - manuals/nodes/rotatingMachine.md + measurement.md (superseded by per-repo wikis)
  - Top-level SCHEMA.md, index.md, log.md, metrics.md, overview.md,
    knowledge-graph.yaml (all Apr-07 snapshot, pre-refactor)
- Kept wiki/concepts/ domain pages (ASM, PID, pump-affinity, settling, etc.)
- Kept wiki/findings/ proven results (BEP, NCog, curve-non-convexity, stability)
- Kept wiki/manuals/node-red/* (FlowFuse + Node-RED runtime docs, still current)
- Kept wiki/tools/* (utility scripts)
- Updated wiki/Archive.md index with 20 rows
- Fixed wiki/Home.md: Tier 6 was wrongly marked done; corrected to pending;
  Tier 9 updated to reflect 2026-05-11 in-progress wave

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

											
										
										
											2026-05-11 21:07:48 +02:00
+								> **⚠️ ARCHIVED — pre-refactor (Tier 1–4, 2026-05)**
 								>
 								> This page describes the architecture before the platform refactor.
 								> The current page is the per-node wiki on **[gitea.wbd-rd.nl/RnD](https://gitea.wbd-rd.nl/RnD)** or **[Home](../Home)**.
 								>
 								> Kept for historical reference only. **Do not update.**
-												Add architecture review and wiki draft

											
										
										
											2026-03-23 11:23:24 +01:00
+								# EVOLV Architecture Review
 								## Purpose
 								This document captures:
 								- the architecture implemented in this repository today
 								- the broader edge/site/central architecture shown in the drawings under `temp/`
 								- the key strengths and weaknesses of that direction
 								- the currently preferred target stack based on owner decisions from this review
 								It is the local staging document for a later wiki update.
 								## Evidence Used
 								Implemented stack evidence:
 								- `docker-compose.yml`
 								- `docker/settings.js`
 								- `docker/grafana/provisioning/datasources/influxdb.yaml`
 								- `package.json`
 								- `nodes/*`
 								Target-state evidence:
 								- `temp/fullStack.pdf`
 								- `temp/edge.pdf`
 								- `temp/CoreSync.drawio.pdf`
 								- `temp/cloud.yml`
 								Owner decisions from this review:
 								- local InfluxDB is required for operational resilience
 								- central acts as the advisory/intelligence and API-entry layer, not as a direct field caller
 								- intended configuration authority is the database-backed `tagcodering` model
 								- architecture wiki pages should be visual, not text-only
 								## 1. What Exists Today
 								### 1.1 Product/runtime layer
 								The codebase is currently a modular Node-RED package for wastewater/process automation:
 								- EVOLV ships custom Node-RED nodes for plant assets and process logic
 								- nodes emit both process/control messages and telemetry-oriented outputs
 								- shared helper logic lives in `nodes/generalFunctions/`
 								- Grafana-facing integration exists through `dashboardAPI` and Influx-oriented outputs
 								### 1.2 Implemented development stack
 								The concrete development stack in this repository is:
 								- Node-RED
 								- InfluxDB 2.x
 								- Grafana
 								That gives a clear local flow:
 . EVOLV logic runs in Node-RED.
 . Telemetry is emitted in a time-series-oriented shape.
 . InfluxDB stores the telemetry.
 . Grafana renders operational dashboards.
 								### 1.3 Existing runtime pattern in the nodes
 								A recurring EVOLV pattern is:
 								- output 0: process/control message
 								- output 1: Influx/telemetry message
 								- output 2: registration/control plumbing where relevant
 								So even in its current implemented form, EVOLV is not only a Node-RED project. It is already a control-plus-observability platform, with Node-RED as orchestration/runtime and InfluxDB/Grafana as telemetry and visualization services.
 								## 2. What The Drawings Describe
 								Across `temp/fullStack.pdf` and `temp/CoreSync.drawio.pdf`, the intended platform is broader and layered.
 								### 2.1 Edge / OT layer
 								The drawings consistently place these capabilities at the edge:
 								- PLC / OPC UA connectivity
 								- Node-RED container as protocol translator and logic runtime
 								- local broker in some variants
 								- local InfluxDB / Prometheus style storage in some variants
 								- local Grafana/SCADA in some variants
 								This is the plant-side operational layer.
 								### 2.2 Site / local server layer
 								The CoreSync drawings also show a site aggregation layer:
 								- RWZI-local server
 								- Node-RED / CoreSync services
 								- site-local broker
 								- site-local database
 								- upward API-based synchronization
 								This layer decouples field assets from central services and absorbs plant-specific complexity.
 								### 2.3 Central / cloud layer
 								The broader stack drawings and `temp/cloud.yml` show a central platform layer with:
 								- Gitea
 								- Jenkins
 								- reverse proxy / ingress
 								- Grafana
 								- InfluxDB
 								- Node-RED
 								- RabbitMQ / messaging
 								- VPN / tunnel concepts
 								- Keycloak in the drawing
 								- Portainer in the drawing
 								This is a platform-services layer, not just an application runtime.
 								## 3. Architecture Decisions From This Review
 								These decisions now shape the preferred EVOLV target architecture.
 								### 3.1 Local telemetry is mandatory for resilience
 								Local InfluxDB is not optional. It is required so that:
 								- operations continue when central SCADA or central services are down
 								- local dashboards and advanced digital-twin workflows can still consume recent and relevant process history
 								- local edge/site layers can make smarter decisions without depending on round-trips to central
 								### 3.2 Multi-level InfluxDB is part of the architecture
 								InfluxDB should exist on multiple levels where it adds operational value:
 								- edge/local for resilience and near-real-time replay
 								- site for plant-level history, diagnostics, and resilience
 								- central for fleet-wide analytics, benchmarking, and advisory intelligence
 								This is not just copy-paste storage at each level. The design intent is event-driven and selective.
 								### 3.3 Storage should be smart, not only deadband-driven
 								The target is not simple "store every point" or only a fixed deadband rule such as 1%.
 								The desired storage approach is:
 								- observe signal slope and change behavior
 								- preserve points where state is changing materially
 								- store fewer points where the signal can be reconstructed downstream with sufficient fidelity
 								- carry enough metadata or conventions so reconstruction quality is auditable
 								This implies EVOLV should evolve toward smart storage and signal-aware retention rather than naive event dumping.
 								### 3.4 Central is the intelligence and API-entry layer
 								Central may advise and coordinate edge/site layers, but external API requests should not hit field-edge systems directly.
 								The intended pattern is:
 								- external and enterprise integrations terminate centrally
 								- central evaluates, aggregates, authorizes, and advises
 								- site/edge layers receive mediated requests, policies, or setpoints
 								- field-edge remains protected behind an intermediate layer
 								This aligns with the stated security direction.
 								### 3.5 Configuration source of truth should be database-backed
 								The intended configuration authority is the database-backed `tagcodering` model, which already exists but is not yet complete enough to serve as the fully realized source of truth.
 								That means the architecture should assume:
 								- asset and machine metadata belong in `tagcodering`
 								- Node-RED flows should consume configuration rather than silently becoming the only configuration store
 								- more work is still needed before this behaves as the intended central configuration backbone
 								## 4. Visual Model
 								### 4.1 Platform topology
 								```mermaid
 								flowchart LR
 								    subgraph OT["OT / Field"]
 								        PLC["PLC / IO"]
 								        DEV["Sensors / Machines"]
 								    end
 								    subgraph EDGE["Edge Layer"]
 								        ENR["Edge Node-RED"]
 								        EDB["Local InfluxDB"]
 								        EUI["Local Grafana / Local Monitoring"]
 								        EBR["Optional Local Broker"]
 								    end
 								    subgraph SITE["Site Layer"]
 								        SNR["Site Node-RED / CoreSync"]
 								        SDB["Site InfluxDB"]
 								        SUI["Site Grafana / SCADA Support"]
 								        SBR["Site Broker"]
 								    end
 								    subgraph CENTRAL["Central Layer"]
 								        API["API / Integration Gateway"]
 								        INTEL["Overview Intelligence / Advisory Logic"]
 								        CDB["Central InfluxDB"]
 								        CGR["Central Grafana"]
 								        CFG["Tagcodering Config Model"]
 								        GIT["Gitea"]
 								        CI["CI/CD"]
 								        IAM["IAM / Keycloak"]
 								    end
 								    DEV --> PLC
 								    PLC --> ENR
 								    ENR --> EDB
 								    ENR --> EUI
 								    ENR --> EBR
 								    ENR <--> SNR
 								    EDB <--> SDB
 								    SNR --> SDB
 								    SNR --> SUI
 								    SNR --> SBR
 								    SNR <--> API
 								    API --> INTEL
 								    API <--> CFG
 								    SDB <--> CDB
 								    INTEL --> SNR
 								    CGR --> CDB
 								    CI --> GIT
 								    IAM --> API
 								    IAM --> CGR
 								```
 								### 4.2 Command and access boundary
 								```mermaid
 								flowchart TD
 								    EXT["External APIs / Enterprise Requests"] --> API["Central API Gateway"]
 								    API --> AUTH["AuthN/AuthZ / Policy Checks"]
 								    AUTH --> INTEL["Central Advisory / Decision Support"]
 								    INTEL --> SITE["Site Integration Layer"]
 								    SITE --> EDGE["Edge Runtime"]
 								    EDGE --> PLC["PLC / Field Assets"]
 								    EXT -. no direct access .-> EDGE
 								    EXT -. no direct access .-> PLC
 								```
 								### 4.3 Smart telemetry flow
 								```mermaid
 								flowchart LR
 								    RAW["Raw Signal"] --> EDGELOGIC["Edge Signal Evaluation"]
 								    EDGELOGIC --> KEEP["Keep Critical Change Points"]
 								    EDGELOGIC --> SKIP["Skip Reconstructable Flat Points"]
 								    EDGELOGIC --> LOCAL["Local InfluxDB"]
 								    LOCAL --> SITE["Site InfluxDB"]
 								    SITE --> CENTRAL["Central InfluxDB"]
 								    KEEP --> LOCAL
 								    SKIP -. reconstruction assumptions / metadata .-> SITE
 								    CENTRAL --> DASH["Fleet Dashboards / Analytics"]
 								```
 								## 5. Upsides Of This Direction
 								### 5.1 Strong separation between control and observability
 								Node-RED for runtime/orchestration and InfluxDB/Grafana for telemetry is still the right structural split:
 								- control stays close to the process
 								- telemetry storage/querying stays in time-series-native tooling
 								- dashboards do not need to overload Node-RED itself
 								### 5.2 Edge-first matches operational reality
 								For wastewater/process systems, edge-first remains correct:
 								- lower latency
 								- better degraded-mode behavior
 								- less dependence on WAN or central platform uptime
 								- clearer OT trust boundary
 								### 5.3 Site mediation improves safety and security
 								Using central as the enterprise/API entry point and site as the mediator improves posture:
 								- field systems are less exposed
 								- policy decisions can be centralized
 								- external integrations do not probe the edge directly
 								- site can continue operating even when upstream is degraded
 								### 5.4 Multi-level storage enables better analytics
 								Multiple Influx layers can support:
 								- local resilience
 								- site diagnostics
 								- fleet benchmarking
 								- smarter retention and reconstruction strategies
 								That is substantially more capable than a single central historian model.
 								### 5.5 `tagcodering` is the right long-term direction
 								A database-backed configuration authority is stronger than embedding configuration only in flows because it supports:
 								- machine metadata management
 								- controlled rollout of configuration changes
 								- clearer versioning and provenance
 								- future API-driven configuration services
 								## 6. Downsides And Risks
 								### 6.1 Smart storage raises algorithmic and governance complexity
 								Signal-aware storage and reconstruction is promising, but it creates architectural obligations:
 								- reconstruction rules must be explicit
 								- acceptable reconstruction error must be defined per signal type
 								- operators must know whether they see raw or reconstructed history
 								- compliance-relevant data may need stricter retention than operational convenience data
 								Without those rules, smart storage can become opaque and hard to trust.
 								### 6.2 Multi-level databases can create ownership confusion
 								If edge, site, and central all store telemetry, you must define:
 								- which layer is authoritative for which time horizon
 								- when backfill is allowed
 								- when data is summarized vs copied
 								- how duplicates or gaps are detected
 								Otherwise operations will argue over which trend is "the real one."
 								### 6.3 Central intelligence must remain advisory-first
 								Central guidance can become valuable, but direct closed-loop dependency on central would be risky.
 								The architecture should therefore preserve:
 								- local control authority at edge/site
 								- bounded and explicit central advice
 								- safe behavior if central recommendations stop arriving
 								### 6.4 `tagcodering` is not yet complete enough to lean on blindly
 								It is the right target, but its current partial state means there is still architecture debt:
 								- incomplete config workflows
 								- likely mismatch between desired and implemented schema behavior
 								- temporary duplication between flows, node config, and database-held metadata
 								This should be treated as a core platform workstream, not a side issue.
 								### 6.5 Broker responsibilities are still not crisp enough
 								The materials still reference MQTT/AMQP/RabbitMQ/brokers without one stable responsibility split. That needs to be resolved before large-scale deployment.
 								Questions still open:
 								- command bus or event bus?
 								- site-only or cross-site?
 								- telemetry transport or only synchronization/eventing?
 								- durability expectations and replay behavior?
-												Extend architecture review with security positioning

											
										
										
											2026-03-23 11:35:40 +01:00
+								## 7. Security And Regulatory Positioning
 								### 7.1 Purdue-style layering is a good fit
 								EVOLV's preferred structure aligns well with a Purdue-style OT/IT layering approach:
 								- PLCs and field assets stay at the operational edge
 								- edge runtimes stay close to the process
 								- site systems mediate between OT and broader enterprise concerns
 								- central services host APIs, identity, analytics, and engineering workflows
 								That is important because it supports segmented trust boundaries instead of direct enterprise-to-field reach-through.
 								### 7.2 NIS2 alignment
 								Directive (EU) 2022/2555 (NIS2) requires cybersecurity risk-management measures, incident handling, and stronger governance for covered entities.
 								This architecture supports that by:
 								- limiting direct exposure of field systems
 								- separating operational layers
 								- enabling central policy and oversight
 								- preserving local operation during upstream failure
 								### 7.3 CER alignment
 								Directive (EU) 2022/2557 (Critical Entities Resilience Directive) focuses on resilience of essential services.
 								The edge-plus-site approach supports that direction because:
 								- local/site layers can continue during central disruption
 								- essential service continuity does not depend on one central runtime
 								- degraded-mode behavior can be explicitly designed per layer
 								### 7.4 Cyber Resilience Act alignment
 								Regulation (EU) 2024/2847 (Cyber Resilience Act) creates cybersecurity requirements for products with digital elements.
 								For EVOLV, that means the platform should keep strengthening:
 								- secure configuration handling
 								- vulnerability and update management
 								- release traceability
 								- lifecycle ownership of components and dependencies
 								### 7.5 GDPR alignment where personal data is present
 								Regulation (EU) 2016/679 (GDPR) applies whenever EVOLV processes personal data.
 								The architecture helps by:
 								- centralizing ingress
 								- reducing unnecessary propagation of data to field layers
 								- making access, retention, and audit boundaries easier to define
 								### 7.6 What can and cannot be claimed
 								The defensible claim is that EVOLV can be deployed in a way that supports compliance with strict European cybersecurity and resilience expectations.
 								The non-defensible claim is that EVOLV is automatically compliant purely because of the architecture diagram.
 								Actual compliance still depends on implementation and operations, including:
 								- access control
 								- patch and vulnerability management
 								- incident response
 								- logging and audit evidence
 								- retention policy
 								- data classification
 								## 8. Recommended Ideal Stack
-												Add architecture review and wiki draft

											
										
										
											2026-03-23 11:23:24 +01:00
 								The ideal EVOLV stack should be layered around operational boundaries, not around tools.
 								### 7.1 Layer A: Edge execution
 								Purpose:
 								- connect to PLCs and field assets
 								- execute time-sensitive local logic
 								- preserve operation during WAN/central loss
 								- provide local telemetry access for resilience and digital-twin use cases
 								Recommended components:
 								- Node-RED runtime for EVOLV edge flows
 								- OPC UA and protocol adapters
 								- local InfluxDB
 								- optional local Grafana for local engineering/monitoring
 								- optional local broker only when multiple participants need decoupling
 								Principle:
 								- edge remains safe and useful when disconnected
 								### 7.2 Layer B: Site integration
 								Purpose:
 								- aggregate multiple edge systems at plant/site level
 								- host plant-local dashboards and diagnostics
 								- mediate between raw OT detail and central standardization
 								- serve as the protected step between field systems and central requests
 								Recommended components:
 								- site Node-RED / CoreSync services
 								- site InfluxDB
 								- site Grafana / SCADA-supporting dashboards
 								- site broker where asynchronous eventing is justified
 								Principle:
 								- site absorbs plant complexity and protects field assets
 								### 7.3 Layer C: Central platform
 								Purpose:
 								- fleet-wide analytics
 								- shared dashboards
 								- engineering lifecycle
 								- enterprise/API entry point
 								- overview intelligence and advisory logic
 								Recommended components:
 								- Gitea
 								- CI/CD
 								- central InfluxDB
 								- central Grafana
 								- API/integration gateway
 								- IAM
 								- VPN/private connectivity
 								- `tagcodering`-backed configuration services
 								Principle:
 								- central coordinates, advises, and governs; it is not the direct field caller
 								### 7.4 Cross-cutting platform services
 								These should be explicit architecture elements:
 								- secrets management
 								- certificate management
 								- backup/restore
 								- audit logging
 								- monitoring/alerting of the platform itself
 								- versioned configuration and schema management
 								- rollout/rollback strategy
-												Extend architecture review with security positioning

											
										
										
											2026-03-23 11:35:40 +01:00
+								## 9. Recommended Opinionated Choices
-												Add architecture review and wiki draft

											
										
										
											2026-03-23 11:23:24 +01:00
 								### 8.1 Keep Node-RED as the orchestration layer, not the whole platform
 								Node-RED should own:
 								- process orchestration
 								- protocol mediation
 								- edge/site logic
 								- KPI production
 								It should not become the sole owner of:
 								- identity
 								- long-term configuration authority
 								- secret management
 								- compliance/audit authority
 								### 8.2 Use InfluxDB by function and horizon
 								Recommended split:
 								- edge: resilience, local replay, digital-twin input
 								- site: plant diagnostics and local continuity
 								- central: fleet analytics, advisory intelligence, benchmarking, and long-term cross-site views
 								### 8.3 Prefer smart telemetry retention over naive point dumping
 								Recommended rule:
 								- keep information-rich points
 								- reduce information-poor flat spans
 								- document reconstruction assumptions
 								- define signal-class-specific fidelity expectations
 								This needs design discipline, but it is a real differentiator if executed well.
 								### 8.4 Put enterprise/API ingress at central, not at edge
 								This should become a hard architectural rule:
 								- external requests land centrally
 								- central authenticates and authorizes
 								- central or site mediates downward
 								- edge never becomes the exposed public integration surface
 								### 8.5 Make `tagcodering` the target configuration backbone
 								The architecture should be designed so that `tagcodering` can mature into:
 								- machine and asset registry
 								- configuration source of truth
 								- site/central configuration exchange point
 								- API-served configuration source for runtime layers
-												Extend architecture review with security positioning

											
										
										
											2026-03-23 11:35:40 +01:00
+								## 10. Suggested Phasing
-												Add architecture review and wiki draft

											
										
										
											2026-03-23 11:23:24 +01:00
 								### Phase 1: Stabilize contracts
 								- define topic and payload contracts
 								- define telemetry classes and reconstruction policy
 								- define asset, machine, and site identity model
 								- define `tagcodering` scope and schema ownership
 								### Phase 2: Harden local/site resilience
 								- formalize edge and site runtime patterns
 								- define local telemetry retention and replay behavior
 								- define central-loss behavior
 								- define dashboard behavior during isolation
 								### Phase 3: Harden central platform
 								- IAM
 								- API gateway
 								- central observability
 								- CI/CD
 								- backup and disaster recovery
 								- config services over `tagcodering`
 								### Phase 4: Introduce selective synchronization and intelligence
 								- event-driven telemetry propagation rules
 								- smart-storage promotion/backfill policies
 								- advisory services from central
 								- auditability of downward recommendations and configuration changes
-												Extend architecture review with security positioning

											
										
										
											2026-03-23 11:35:40 +01:00
+								## 11. Immediate Open Questions Before Wiki Finalization
-												Add architecture review and wiki draft

											
										
										
											2026-03-23 11:23:24 +01:00
 . Which signals are allowed to use reconstruction-aware smart storage, and which must remain raw or near-raw for audit/compliance reasons?
 . How should `tagcodering` be exposed to runtime layers: direct database access, a dedicated API, or both?
 . What exact responsibility split should EVOLV use between API synchronization and broker-based eventing?
-												Extend architecture review with security positioning

											
										
										
											2026-03-23 11:35:40 +01:00
+								## 12. Recommended Wiki Structure
-												Add architecture review and wiki draft

											
										
										
											2026-03-23 11:23:24 +01:00
 								The wiki should not be one long page. It should be split into:
 . platform overview with the main topology diagram
 . edge-site-central runtime model
 . telemetry and smart storage model
 . security and access-boundary model
 . configuration architecture centered on `tagcodering`
-												Extend architecture review with security positioning

											
										
										
											2026-03-23 11:35:40 +01:00
+								## 13. Next Step
-												Add architecture review and wiki draft

											
										
										
											2026-03-23 11:23:24 +01:00
 								Use this document as the architecture baseline. The companion markdown page in `architecture/` can then be shaped into a wiki-ready visual overview page with Mermaid diagrams and shorter human-readable sections.