The Role of AI Agents in Network Automation

Alex Cronin
Nov 24, 2025
8 min read

Updated: Dec 1, 2025

Over the last year, the networking industry has seen an explosion of interest in “agentic AI.” Many vendors now describe their product as agentic or AI powered. The marketing is strong, the enthusiasm is high, and the terminology is being stretched in every direction.

From my many conversations, many network engineers want to understand what is real, what is aspirational, and what is essentially an evolution of the same workflow engines we have used for years.

This article is an attempt to clarify what “agentic” actually means in the context of networking, how it relates to AI-native operations, and where Nanites fits into that picture.

AI-Assisted vs AI-Native Network Automation

In most environments today, AI is an extra pair of hands, not the primary interface.

In an AI-assisted model, the engineer still drives everything: you jump between monitoring dashboards, NMS screens, ticketing systems, and router CLIs. You decide which BGP neighbors, interfaces, or VRFs to inspect. You ask an LLM to summarize logs, explain an error, or draft a command sequence.

AI speeds up pieces of the workflow, but the engineer is still the one thinking through the problem, navigating between tools, correlating what they find, and deciding what action to take.

In an AI-native model, the interaction flips. The engineer describes the outcome they care about, and an AI system takes responsibility for figuring out how to interrogate the network. The engineer reviews and approves, but the system handles the investigation.

AI-Assisted	AI-Native
Engineer drives investigation	Agent drives investigation
AI helps with individual tasks	AI owns the workflow
Human correlates findings	Agent correlates findings
Tool-by-tool navigation	Intent-based requests
"Summarize these logs"	"Find why this path is slow"

Instead of manually pivoting between disparate systems, tools, and data sources such as SNMP graphs, flow records, router CLI, and change logs, you might say: “Figure out why traffic to the payments service is jittery from Europe” or “Identify the most likely cause of these intermittent drops in the leaf-spine fabric.” An agentic system then chooses which devices and tools to query, pulls telemetry, counters, logs, and config history, tries different lines of investigation, and comes back with a reasoned explanation and a proposed fix.

This isn’t just about speed. When agents handle the repetitive inspection and correlation work, engineers can focus more on architecture, safety, and policy, rather than spending hours in screen-hopping mode.

To operate this way, you need agents that carry context across an entire investigation, not isolated LLM queries that forget everything between prompts. LLMs can assist individual tasks, but agents are what maintain context, move across tools, and execute multi-step workflows without being micromanaged.

What Agentic Really Means in Networking

Most network automation platforms today were designed around deterministic, rule-based execution: templates, policy models, validation engines, drift checks, configuration generation, and preprogrammed remediation runbooks.

These systems depend on structured inputs and predefined sequences of actions. They assume the network state is known in advance or can be validated against an expected model. They are very good at building, pushing, and validating configurations. "AI-powered" today usually means the AI helps generate templates or run predefined health checks. The underlying logic is still static. The system still follows a script written by a human. It cannot adapt when the situation doesn't match the expected pattern. A true agentic system behaves more like a skilled network engineer than a template engine. At a minimum, it must be able to:

Observe live network state directly (CLI, APIs, telemetry, TSDB, etc.).
Form hypotheses based on incomplete or messy information.
Select appropriate tools or commands to test those hypotheses.
Gather new data, notice contradictions, and adjust its plan.
Iterate until it arrives at a root cause and a remediation path.
Recommend or execute the correct fix under guardrails.
Validate that the network actually returned to a healthy state.
Document its reasoning and evidence along the way.

In other words, an agent should be able to take a high-level directive like “traffic to this application is slow” or “payments keep failing,” break it into sub-questions, decide which tools and data sources to use (dashboards, logs, traces, config history, feature flags, etc.), and converge on an answer without a predefined workflow telling it exactly what to do next.

This is the fundamental difference between a workflow executor and an agent. A workflow engine follows a script; an agent discovers the script in real time.

Why Multi-Agent Systems are Essential to Making Networking AI Native

Modern production networks are messy, layered systems. Routing, transport, queues, policies, overlays, services, and security controls all interact in ways that are hard to capture in a single, unified mental model. When a critical part of the network slows down or fails, the investigation might require all at once:

Checking routing and ECMP paths across multiple domains
Looking at interface errors, discards, and queue depths
Comparing current flow patterns and traffic distribution against baselines
Inspecting recent config changes, firmware upgrades, or policy updates
Reviewing firewall, NAT, and DNS behavior for anomalies
Understanding which customers, regions, or applications are actually impacted and how that maps to SLAs

Many of these activities require domain-specific expertise and contextual data. As system complexity increases, a single agent (or a single LLM with tools) faces exponential growth in context and decision space. Unlike workflow engines that depend on an accurate source of truth or topology database, an agentic system discovers state directly from the network. It doesn't break when the CMDB is stale or the inventory is incomplete.

Building production-ready multi-agent systems is hard. You need to understand how real network incidents unfold, which signals actually matter when a fabric is degraded, how to interpret inconsistent data from different vendors, and what failure patterns keep recurring. At the same time, you have to solve the hard distributed systems problems: managing context propagation between agents, orchestrating parallel investigations without collisions, handling partial tool failures, and designing guardrails that keep agents from looping.

How Nanites Approaches Agentic Network Automation

Nanites was built around the assumption that AI should not just help construct workflows; it should act as the core decision layer inside a controlled, guarded environment.

In practice, the system logs into network devices, uses your existing tools and data sources, observes live and historical state, and iteratively reasons about what is happening. It forms hypotheses about what might be wrong, decides which diagnostics to run, interprets the results even when they are noisy or inconsistent, narrows down likely root causes, and then proposes the right fixes while validating that the network is moving back toward a healthy state.

In the video above, we show what is already possible. We ask Nanites why BGP is down between two Cisco devices, one IOS-XR and the other IOS-XE, in the same network. The system retrieves the device inventory and deploys multiple agents in parallel, each targeting a different device and choosing commands appropriate to its OS. The results come back together: one side shows the session in Active, the other shows Idle with log entries confirming an administrative shutdown. The system correlates these findings, identifies the root cause, and offers to fix it.

When the user approves the fix, the system applies the configuration change, removes the shutdown, and verifies that the session comes back up. ⚠️Demo note: the remediation step was performed in a controlled lab environment. In production, Nanites operates in read-only mode, and write actions are disabled. Currently, Nanites can only produce the exact commands for an engineer to review and apply. Here is the key: We let AI determine the workflow. No runbook. No playbook. No script. No human told it which commands to run. That is the difference between an AI assistant and an autonomous system. Under the hood, the system reasoned through the problem the way an engineer would: user asks → agents investigate → system proposes → user approves → Nanites fixes and verifies. Note: We chose a relatively simple incident so the autonomy is easy to see; we’ve validated the same agentic loop across L1-L3 and telemetry-driven tests (CRC/error-counter analysis, discards/drops, and multi-protocol neighbor and route health). All of this happens under explicit guardrails with safety and security as our first priority. Nanites is engineered to operate only through approved tools and interfaces; every step is logged, and configuration changes stay under human control through clear approval flows. The goal is not an unconstrained black box, but a system that behaves like a careful engineer at machine speed. We know that trusting an AI to run commands on production infrastructure is a significant leap. Solving this is not just a technical problem. It requires building confidence through transparency, clear boundaries, and proving reliability over time. We are not claiming to have it all figured out, but we believe this is the right direction.

Why This Matters

Runbooks, playbooks, and scripts work when the failure matches a known pattern. Real network incidents rarely do, especially across mixed vendors, OSes, and noisy telemetry. “Vanilla AI” is great for explaining outputs, but it does not own the investigation end-to-end or verify outcomes. MCP is an important integration layer, but it is not the decision layer. Agents are the difference: they can turn a goal into a multi-step investigation across tools, converge on root cause, and follow through with an approved fix and verification.

Why Networking is Uniquely Difficult

Networks are uniquely difficult because you rarely see the full picture. Devices come from different vendors, run different operating systems and versions, and expose different CLIs, APIs, and telemetry formats. Even simple questions like “where is this packet dropping” or “why is this path slow” can require hopping across routers, switches, firewalls, load balancers, and overlays that all describe the world differently.

On top of that, networks are real-time and shared. A subtle misconfiguration or policy change in one place can quietly impact thousands of flows somewhere else. Troubleshooting often happens under pressure, with incomplete data, noisy alerts, and business-critical traffic on the line. That combination of heterogeneity, partial visibility, and high blast radius is what makes networking a notoriously hard domain, and why after decades of scripts and templates, there is still no cohesive solution that really solves it end to end. We believe AI agents that can reason across vendors, tools, and layers will finally start to fix these decades-old problems.

Where the Industry Is Today

Most current “AI automation” products in networking fall into two broad categories:

AI-assisted deterministic automation: AI helps generate workflows, models, templates, and checks. The underlying engine remains deterministic.
Analytics with AI summaries: AI summarizes telemetry, anomalies, or alert patterns, but does not yet take full end-to-end action.

Both categories are valuable and, for many teams, exactly the right next step. The reality is that most networking teams are still working through foundational challenges: getting consistent config templates, maintaining accurate sources of truth, and writing basic scripts to automate repetitive tasks. CLI still dominates over NETCONF or RESTCONF wrappers in many environments, and a lot of real operational data still comes from parsing unstructured output. The conversations happening in the industry today sound remarkably similar to the conversations from a decade ago.

Agentic networking is emerging as the intelligence layer on top of this foundation. It will coexist with human engineers and existing tools, not replace them. Much of the real engineering work today is in designing plans, guardrails, tools, and evaluations so that agentic systems behave predictably in messy, real-world networks rather than just in demos.

About Nanites

Nanites is a system of specialized AI agents on call 24/7, helping you resolve network issues in minutes and making networks easier to operate. Nanites is working with leading companies and organizations to build the world’s first AI network autopilot, and has been a featured speaker at CableLabs and NetworkX, mentioned by Cisco, and was a Top 12 finalist in the 2025 T-Mobile T-Challenge. Nanites is in early access with select design partners. We’re validating reliability, guardrails, and workflows in controlled lab and pilot environments before broader production availability.