// research · flagship project

Pathfinder AI.

An agentic vulnerability assessment framework for operational technology environments. Built around the PEER lifecycle (Plan → Enumerate → Exploit → Report) — a VAPT-specific contribution — with an inner ORDA control loop driving iteration-level behaviour and a scoped governance layer gating every tool call.

Status

Phases 0–2 complete · end-to-end verified 12 May 2026 · 142 tests passing

Programme

PhD · Keele University

Stack

Python 3.11+ · Ollama (DeepSeek R1 14B) · MCP · Jinja2 · async httpx

Operational technology environments — water treatment plants, power grids, chemical processing — don't tolerate the assumptions IT penetration testing tools were built on. You can't just nmap a programmable logic controller and walk away. The device may behave correctly under scan; the network may not survive at all. Real OT engagements are dominated by caution, sequencing, and an explicit allowance for the possibility that the diagnostic itself is the incident.

Existing agentic and generative AI work in security overwhelmingly targets IT systems and treats tool invocation as a low-cost action. For OT, that assumption is wrong. Pathfinder AI is an attempt to build an agent whose decision layer takes consequence seriously — where "do nothing" is a first-class choice, every action is preceded by explicit reasoning, and every step is governed by engagement-bound scope before it touches a tool.

Pathfinder operates at two levels. The Plan–Enumerate–Exploit–Report (PEER) lifecycle structures the assessment workflow, reflecting how practical VAPT engagements actually unfold. Within each PEER phase, an observe–reason–decide–act (ORDA) control loop drives iteration-level behaviour. A governance layer enforces engagement scope before every tool call; structured, hash-chained telemetry records every iteration for post-hoc audit.

peer // outer

Plan → Enumerate → Exploit → Report

The lifecycle a real VAPT engagement actually follows. Each PEER phase has its own goals, allowlisted tools, and exit conditions. The agent does not freelance across phases.

orda // inner

Observe → Reason → Decide → Act

The control loop that runs inside each PEER phase. The reason step is a local LLM (DeepSeek R1 14B via Ollama) producing structured JSON; the decide step gates that proposal against policy before any tool call.

governance // gating

ScopedPolicy

Engagement-bound scope loaded from YAML. Tool allowlists, action-class filters, hard prohibitions per phase (no Modbus writes, no destructive payloads, no persistence). Default-deny.

audit // persistence

Hash-chained telemetry

JsonLinesTelemetry writes one structured event per loop step, each line including a SHA-256 hash of the previous line. Silent edits to the audit trail are detectable; an action can be reconstructed and challenged without re-running the agent.

A representative trace of the Pathfinder agent loop. The agent receives a reconnaissance goal against an OT-flavoured target, reasons about which tool to use, executes it, and parses the result back into the next reasoning step. Target IP is in the RFC 5737 documentation range — reserved for examples, not a real host.

The governance and audit story is what makes Pathfinder a research contribution rather than another LLM agent wrapper. Three components carry the claim:

Evaluation runs against a purpose-built multi-PLC water treatment testbed — a three-PLC, dual-zone Modbus topology with a Scada-LTS HMI layer, four seeded vulnerabilities (default credentials on two PLCs and the HMI, plus unauthenticated Modbus), and segmented dmz-net / ot-net Docker networks. The testbed is deliberately scoped to address the gap identified in the accompanying systematic review: the absence of agentic VAPT evaluation against realistic OT infrastructure under governance constraints.

Phases 0–3 are complete. The full ORDA loop is implemented and operational; three custom MCP servers (Nmap, HTTP auth, Modbus) are live and governance-scoped; the engagement-loading and policy-enforcement pipeline is end-to-end verified against the live Multi-PLC Water Treatment Testbed, with all seven services discovered under ScopedPolicy governance on 12 May 2026. The repository carries 197 passing tests across the framework. Active development is on Phase 4 (PEER orchestrator) — sequencing all four phases automatically with inter-phase state handover.

Companion documentation includes a Technical Design Document, a Minimum Defensible Thesis paper defining trigger conditions for the fallback plan, and per-phase engagement YAMLs that the policy layer loads at run time. The repository is at github.com/SamTruss/Pathfinder-AI.

The systematic literature review that informs the framework is co-authored with Mohamed Chahine Ghanem (University of Liverpool) and Marcio J. Lacerda (London Metropolitan University), targeting Computers & Security or the Journal of Information Security and Applications. The OSF archive is at osf.io/d7p8j.

← Back to research