Agentic AI Safety for Enterprise Networks

Alex Cronin
Dec 11, 2025
9 min read

🎧 Nanites Podcast: Agentic Safety

The promise of agentic AI is compelling: autonomous systems that can investigate problems, correlate data across tools, and take action without constant human direction. For network operations, this could mean drastically faster troubleshooting, fewer 3 am pages, and engineers freed to focus on high-level architecture instead of screen-hopping through CLIs.

But there is an obvious question that comes up in every conversation with enterprise teams: How do you make sure the AI doesn't break something?

Networks are production infrastructure. A misconfigured interface or an accidental reload can take down services for thousands of users. Trusting an AI system to interact with routers and switches requires deliberate engineering around the assumption that AI agents will hallucinate, will receive adversarial inputs, and will encounter situations they were not trained for. The question is not whether something will go wrong, but whether the system fails safely when it does.

Our Experience Building an Agentic System

Over the past twenty months building Nanites, safety has been our primary engineering focus. The system was architected from day one to enforce safety through deterministic controls rather than relying on the model to behave correctly.

Prompt Engineering Is Not Enough

The most intuitive approach to AI safety is asking the model not to do dangerous things. You could write a system prompt that says "never run configuration commands", but this approach has a fundamental problem: large language models are probabilistic systems, not rule-following engines. They can be manipulated through prompt injection, where malicious text in device output or user input could convince the model to ignore its instructions. They could hallucinate commands that were never requested, or misinterpret ambiguous instructions in unexpected ways. With the right context and prompt engineering, these issues are rare in practice, but they remain possible, and "possible" is not an acceptable risk profile for production infrastructure.

Prompt-level guidance has a role in agentic systems, but that role is alignment, not enforcement. It shapes the model's default behavior and reduces the frequency of undesirable actions, but it is not where security lives.

In practice, with proper context and prompt engineering, models perform well and rarely attempt operations outside their constraints. But the underlying risk remains: prompt injection through device output (banners, MOTD, log lines) is a known attack vector, and no amount of prompt engineering can provide the guarantees that production infrastructure requires. That is why we treat prompt-level guidance as a soft control for alignment in our architecture, while security enforcement happens at the application and device layers.

Deterministic Enforcement

The core principle behind a production-ready safety architecture is straightforward: never trust the model for safety-critical decisions. Every command the agent wants to execute passes through deterministic validation before it reaches any system, and this validation is implemented in application code, not in prompts.

The validation layer sees a string of text and applies deterministic rules: Does this command start with an approved prefix? Does it contain injection characters? Is the target device in the approved inventory? These checks happen in milliseconds, every single time, providing a consistent enforcement layer independent of model output. If a command does not pass the allowlist, it is rejected. If the model were to hallucinate a dangerous command, it would be blocked. If an attacker tried prompt injection through device output, the command would still fail validation.

Multiple Independent Defense Layers

Production safety requires more than any single control; it requires defense-in-depth, where multiple independent layers each provide protection and an attacker would need to defeat all of them simultaneously to cause harm.

Layer	What It Does
Tool Inventory	No write-capable tools exist. The agent cannot call a configuration tool because none is exposed.
Command Validation	Every command must match an allowlist. Dangerous patterns are blocked before reaching any device.
Injection Blocking	Shell metacharacters and command chaining are rejected. No way to sneak in a second command.
Device AAA	Network devices enforce read-only credentials. Even if all application controls fail, the device rejects writes.
Connection Targeting	The agent cannot specify arbitrary hosts. Device targeting is resolved server-side against operator-controlled inventory.
Availability Guardrails	Timeouts, concurrency limits, and output caps prevent resource exhaustion.
Audit Trail	Every command attempt is logged with full attribution, regardless of whether it was allowed or blocked.

The important property is independence. A bug in the allowlist does not matter if device AAA is configured correctly. A misconfigured AAA does not matter if the allowlist is working. An attacker would need to defeat multiple unrelated systems simultaneously.

This concept is borrowed from how critical infrastructure has been secured for decades. Nuclear facilities do not rely on a single safety system. Aircraft do not have one redundant control surface. The principle is the same: assume any individual component can fail, and design so that failure does not cascade into a system-wide catastrophic failure.

Fail-Closed by Default

When the system encounters something unexpected, the safe behavior is to stop. Unknown commands are rejected. Ambiguous inputs are rejected. If a device returns an unexpected prompt, the session terminates rather than attempting to continue.

For systems interacting with production network infrastructure, the safer approach is to stop when encountering unexpected situations, log what happened, and surface it for human review. The instinct to make systems resilient has to be balanced against the reality that improvisation outside tested boundaries is where safety guarantees break down.

Building AI-Native Software Is Different

AI-native software development differs fundamentally from traditional software engineering. In traditional software, you write deterministic code, and if you want different behavior, you change the code. The relationship between code and behavior is direct and predictable.

In AI-native systems, the model is a probabilistic component in the middle of the architecture. You cannot directly specify its behavior; you can only influence it through prompts, examples, and constraints. This means there are knobs to turn at every layer of the system:

Base model selection determines baseline capabilities and failure modes. Different models have different response patterns, and the same prompt produces meaningfully different behavior depending on the underlying model.

System prompts establish the agent's role, constraints, and objectives. These prompts have been refined through hundreds of iterations because even small changes could have significant effects on model behavior.

Orchestrator and sub-agent prompts shape the overall investigation strategy, vendor-specific knowledge, command syntax, and safety rules for each platform.

Tool descriptions determine which tools the model selects for a given task. Poorly written tool descriptions lead to poor tool selection, which leads to failed investigations or inappropriate actions.

Output parsing affects how structured information is extracted from model outputs. Ambiguous parsing leads to cascading errors as malformed data propagates through the system.

Temperature and sampling parameters affect how deterministic or creative the model's responses are. For safety-critical decisions, you want low temperature. For investigation and hypothesis generation, you might want more variance.

Every one of these knobs affects system behavior. Change the system prompt, and the agent might start selecting different tools. Change the tool descriptions, and the orchestrator might deploy agents in a different order.

Thousands of Evals

You cannot reason your way to correct behavior by reading the code; you have to run the system thousands of times and observe what it actually does. Evaluations are a core part of building AI-native systems.

For safety specifically, the evaluation suite covers scenarios like:

Prompt injection through device banners, MOTD, and log output
Command injection through crafted inputs (semicolons, pipes, newlines)
Attempts to escalate privileges or access enable mode
Cross-device pivoting (can an agent assigned to Device A query Device B?)
Resource exhaustion (what happens if a command runs for 10 minutes?)
Credential exposure (does sensitive data leak into logs or responses?)

For each scenario, expected behavior is defined and automated checks verify the system behaves correctly. When something fails, we investigate whether it is a prompt issue, a validation issue, or a model behavior issue, then fix it and add it to the regression suite.

The evaluation process is continuous. Every time a prompt is updated, safety evals run again. Every time a new capability is added, new evals are added. The eval suite grows monotonically; tests are never removed.

Multi-Agent Safety Considerations

Multi-agent architectures, where more than one agent operates within a system, introduce safety challenges that do not exist in single-agent systems.

Scope isolation ensures that agents can only access their assigned resources. An agent assigned to investigate one device should not be able to query a different device, even if the model decides that would be helpful. This requires enforcement at the tool layer, where agents receive device-scoped tool instances that physically cannot target other devices.

Inter-agent communication is a potential injection vector. When agents share information or build on each other's findings, malicious content from one agent's output could attempt to manipulate other agents' behavior and cascade through the process at runtime. Safe architectures treat all agent outputs as untrusted data.

Concurrency control prevents multiple agents from overwhelming devices or management networks. Limits must be enforced globally across all agents, not just within individual agent instances.

Failure handling determines how the system responds when one agent fails while others succeed. The system must handle partial failures gracefully, because partial results are often still useful.

What We Learned Building This

When we started Nanites in early 2024, we knew safety would be critical for production deployments. But to understand what AI agents are truly capable of, we built the full system first: read and write access, configuration changes, remediation workflows, the complete loop.

We needed to see the ceiling. What can an autonomous agent actually do when you remove the constraints? The answer: a lot. The system can diagnose issues, correlate findings across devices, identify root causes, generate fixes, apply them, and verify the network returns to a healthy state, all with minimal human input.

But capability and production-readiness are different things. From the beginning, we knew write operations would stay disabled until the safety engineering caught up. Building the full system showed us what was possible. Constraining it for production showed us what was responsible.

Some things we learned along the way:

Audit everything. You need to know exactly what the system attempted, whether it succeeded, and why. Every tool invocation in Nanites is logged with full attribution. This is how you build confidence over time that the system behaves as expected, and it is also how you investigate when something goes wrong.

Transparency builds trust. Enterprises are rightfully skeptical of black-box AI systems interacting with production infrastructure. We maintain a detailed safety architecture document and review it with customers’ security teams during evaluation, walking through how the guardrails work and what the system is allowed to do.

Iterate relentlessly. System prompts have gone through thousands of revisions. Small wording changes can have meaningful effects on model behavior, so prompts are version-controlled the same way code is, with commit messages explaining why each change was made.

The Current State

Today, Nanites operates in read-only mode in production environments. The agentic system can investigate, diagnose, and recommend actions, but it cannot change device configuration, even with explicit human instructions.

In read-only mode, the worst-case failure scenario is a failed read operation or a verbose show command briefly consuming device CPU. No configuration changes, no outages, no data loss. Write operations have a fundamentally different blast radius, which is why they require additional controls we are still building: approval workflows, rollback capabilities, change-control integration, and additional validation layers.

What This Means for Enterprises

If you are evaluating agentic AI systems for network operations, we believe these are questions that matter:

Question	Why It Matters
What is the worst-case failure scenario?	Understand the blast radius before you deploy
Where is safety enforced?	Look for deterministic controls in application code and device-level enforcement
Can the agent disable its own guardrails?	The model should not be able to modify its own constraints
What happens when something unexpected occurs?	Fail-closed systems stop and log; fail-open systems continue and may cause harm
What is the audit trail?	Every action should be logged with full attribution

Looking Ahead

The industry is early in understanding how to build safe agentic systems. Best practices are still emerging, standards do not exist yet, and every team building in this space is learning as they go.

Safety architectures will likely become a differentiator as agentic AI moves from pilots to production. Enterprises will increasingly ask vendors to prove their systems are safe, not just claim it. Documentation, audit trails, evaluation results, and third-party assessments will become table stakes.

The transition from read-only to read-write will be a major milestone. We believe the first safe steps for write-capable agentic systems will use infrastructure-as-code methods: pre-approved templates, playbooks, and change pipelines executed through semi-autonomous, human-in-the-loop workflows rather than free-form commands on live devices. Closed-loop automation from detection to remediation is the holy grail of IT operations, but it is also where the real risk lies. The teams that prove they can do this safely at “five nines” and beyond will define the next generation of network operations.

About Nanites

Nanites is a system of specialized AI agents on call 24/7, helping you resolve network issues in minutes and making networks easier to operate. Nanites is working with leading companies and organizations to build the world’s first AI network autopilot, and has been a featured speaker at CableLabs and NetworkX, mentioned by Cisco, and was a Top 12 finalist in the 2025 T-Mobile T-Challenge. Nanites is in early access with select design partners. We’re validating reliability, guardrails, and workflows in controlled lab and pilot environments before broader production availability.