

8 min read time
AI Summary by Centific
Turn this article into insights
with AI-powered summaries
Topics

Sanjay Bhakta
In March 2026, an internal AI agent at Meta triggered a security incident after taking an action it had not been explicitly instructed to perform. It had valid access to the environment it was operating in. It was connected to the right tools, operating within its permissions, and executing a workflow it was designed to support. There was no external breach in the conventional sense. The issue was that the agent interpreted context, made a decision, and acted in a way that exceeded its intended role.
From a system perspective, the behavior was coherent. From a security perspective, it exposed a gap. The controls in place governed access, not intent. They ensured the agent could only interact with approved systems, but they did not constrain how the agent interpreted signals or what combinations of actions it could take once inside the workflow.
Most enterprise AI security practices are built around models that produce outputs. Agentic systems operate inside workflows that produce outcomes. Security has to be structured around that change.
Security must define what an agent is allowed to do, not just what it can access
Security models for visual language models and large language systems are designed to control exposure. They define which data a model can access, how outputs are filtered, and how usage is monitored across users and interfaces. These controls assume that risk is introduced through interaction at the edges of the system.
Agentic AI operates differently. Agents are embedded inside workflows where they take inputs, evaluate conditions, and trigger actions. Security must define which actions the system can take and the exact conditions that allow them.
Consider an agent deployed in a security operations environment to triage potential threats. The agent ingests alerts, correlates them with historical patterns, and determines whether to escalate, suppress, or initiate a response. Governance for this system cannot stop at defining which logs or signals the agent can access. It must define how signals map to decisions, what confidence thresholds trigger escalation, and which actions can occur automatically versus requiring human review.
If those boundaries are not explicit, the agent can take actions that are technically permitted but operationally incorrect. It may suppress alerts that should be investigated or escalate benign activity in ways that disrupt operations. The issue is a lack of control at the task level.
Context and memory introduce persistent points of failure
Agentic AI relies on accumulated context to function across tasks. Systems store prior interactions, intermediate decisions, and environmental signals so that future actions reflect more than a single input. This allows agents to operate across workflows, but it also means that errors are not isolated. A compromised or misleading input can influence future behavior long after the initial interaction.
Take an agent responsible for monitoring activity in a public transit system. The agent reviews live video feeds from platforms and entrances, combines that with motion sensors and historical traffic patterns, and identifies conditions that require attention. For example, it may be configured to flag a backpack left unattended for more than three minutes during peak hours, or to detect crowd density that exceeds a defined threshold near stairwells or exits. Over time, it builds a baseline of what normal activity looks like at each station by time of day, day of week, and event schedule.
Now consider what happens if that context is gradually distorted. A series of inputs may indicate that bags left on platforms for longer periods are common and low risk, or that crowd surges near a specific entrance are routine during certain hours when they are not. The agent incorporates those signals into its baseline. It begins to extend the time threshold for unattended objects or raise the tolerance for crowd density before triggering an alert.
At that point, the system is still operating logically. It is applying its learned behaviors consistently against the context it has learned. The problem is that the context has shifted away from reality. A bag that should trigger an alert after three minutes may now be ignored for ten. A crowd condition that should prompt intervention may be treated as normal flow.
Each individual decision appears reasonable when viewed in isolation. The risk emerges from the accumulation of those decisions over time, as the system continues to act on a baseline that no longer reflects the conditions it was designed to monitor.
Controls need to address how context is established, how long it persists, and how it is validated. Without those safeguards, the system can continue to operate while gradually moving away from intended behavior.
Multi-agent workflows require controls at the point of handoff
Agentic systems rarely operate as a single decision point. They are structured as workflows where multiple agents handle sensing, interpretation, decision-making, and execution. This structure enables more complex outcomes, but it also creates dependencies between agents. Each handoff becomes a point where assumptions are transferred.
Consider a traffic management system that uses multiple agents to monitor conditions, identify congestion, and adjust signals. One agent processes sensor data and flags a potential issue. A second agent evaluates the severity and recommends a response. A third agent executes changes to traffic controls.
If the initial signal is incorrect, that error does not remain contained. It is passed along as input to the next agent, which treats it as valid. The workflow reinforces the same assumption at every step. By the time the system acts, the decision reflects a chain of dependent judgments rather than a single point of failure.
Systems need clear agency perimeters and policies while identity context (authentication and authorization) is needed on how agents validate inputs from other agents, when uncertainty interrupts the workflow, and where human review is required. Without those controls, errors propagate through coordination rather than appearing as isolated mistakes.
Tool use requires boundaries on sequences, not just permissions
The ability to use tools allows agentic systems to operate. Agents can update records, trigger workflows, interact with infrastructure, and coordinate across systems. These capabilities are governed today through permissions and access controls.
In agentic systems, that is not sufficient. Exposure emerges from how tools are used in sequence. An agent managing inventory may detect a shortage, place an order, reroute shipments, and update availability across systems. Each of these actions is valid within the system’s permissions. The issue arises when those actions are driven by incorrect assumptions or manipulated inputs, producing an outcome that disrupts operations.
Security controls need to adopt the MAESTRO (Multi-Agent Environment, Security, Threat, Risk, and Outcome) threat modeling framework specifying not only which tools an agent can access, but how those tools can be combined and comprehending the types of vulnerabilities. They must establish constraints on action sequences, define acceptable triggers for each step, and introduce checkpoints where automated execution pauses if conditions are not met.
Securing AI agents requires Zero Trust architecture applied at the agent layer: least privilege access scoped to individual tasks, micro-segmentation between agent workloads, and attribute-based access control (ABAC) that enforces context-aware permissions at runtime. Security operations tooling including SIEM, SOAR, IDS, and IPS provides the monitoring and response layer that detects when agents deviate from expected behavior.
Physical systems extend the consequences of failure
The impact of agentic AI becomes more significant when it interacts with physical environments. In smart city deployments, transportation systems, and industrial operations, agents are connected to sensors, cameras, and edge devices. They interpret signals and trigger real-world responses.
An agent monitoring activity in a public space may reposition cameras, dispatch drones, or alert response teams based on detected patterns. A system managing road conditions may identify hazards such as debris or potholes and trigger maintenance workflows or traffic adjustments.
In these environments, operational boundaries need to specify the conditions under which action is taken. They must define what level of confidence is required, how conflicting signals are resolved, and when human oversight is necessary. They also need to account for the downstream effects of action, not just the initial decision.
A system that acts on incomplete or incorrect information changes the state of the environment.
Agentic AI security starts with control over behavior
Agentic AI introduces risk at the level of behavior inside operational workflows. Systems can act with valid access, correct permissions, and coherent internal logic while still producing outcomes that fall outside intended boundaries. Failures persist through context, propagate across agents, and scale through tool use.
Agentic systems do not fail in isolation. They fail inside workflows that continue to operate. That makes detection slower, containment harder, and consequences more difficult to reverse.
Agentic AI requires control at the level of task, context, and consequence. It requires defining what an agent is allowed to do, under what conditions, and how its actions are validated as they move through a system.
In the next article, we will examine how governance models can be structured to provide that control, including how to define operational boundaries, validate agent behavior, and introduce checkpoints in workflows designed for autonomous action.
How Centific addresses agentic AI security
Centific’s AIDF platform supports agentic AI governance assessment, while Verity AI addresses agentic AI security implementation. Together they strengthen security posture across several dimensions: Zero Trust Architecture to reduce blast radius, decentralized identity management, dynamic policy-based access controls, authenticated delegation frameworks, continuous monitoring through SIEM, SOAR, IDS, and IPS, adaptive risk assessment and mitigation, agent discovery and trust, and vulnerability and threat management aligned to OWASP, MITRE, and NIST standards.
Are your ready to get
modular
AI solutions delivered?
Connect data, models, and people — in one enterprise-ready platform.
Latest Insights
Connect with Centific
Updates from the frontier of AI data.
Receive updates on platform improvements, new workflows, evaluation capabilities, data quality enhancements, and best practices for enterprise AI teams.

