The Agentic Paradox: Overcoming Goodhart’s Law in the Era of AI and Cybersecurity

Introduction

We have officially entered the second phase of the AI revolution: agentic AI, where autonomous systems independently execute complex tasks and decisions at machine speed. The question is whether our defences are keeping pace.

By 2028, a third of enterprise applications are projected to feature agentic AI, with a significant fraction of organisational decisions made entirely autonomously¹.

These autonomous agents are a force multiplier on both sides of the battlefield. For defenders, they allow us to fortify environments at machine speed. For attackers, they collapse the typical "dwell time" between intrusion and breach effectively to zero. And that force multiplication is not equally distributed. Threat actors face no compliance requirements, no ethics committees, no friction. We do.

The capability of Anthropic's Mythos to discover vulnerabilities, chain them together, and write code to exploit them is improving at a staggering rate. In response, a consortium of technology companies has formed Project Glasswing, racing to use Mythos to patch zero-day flaws before adversaries can exploit them. We are looking at a future where AI acts as the ultimate force multiplier, allowing organisations to fortify their environments at machine speed. The question is who gets there first.

The Fallacy of the Moat: Why Perimeter Defense Fails Against Autonomy

For decades, the fundamental posture of enterprise security relied on a simple premise: secure the perimeter. Yet despite billions injected into the global security economy, the traditional architecture designed for the pre-AI internet simply cannot withstand the velocity of modern threats. Today, the corporate estate is a hyper-distributed, heterogeneous network of multi-cloud environments, SaaS applications, and, increasingly, non-human identities. There is no single perimeter left to defend.

Modern threat actors are weaponizing AI to accelerate their own operations. Breakout times, the window between an initial intrusion and lateral movement, have plummeted, frequently occurring in under an hour. Attackers are using machine learning to craft highly personalised spear-phishing campaigns at unprecedented scale, mapping organisational structures, and autonomously hunting for unpatched zero-day vulnerabilities.

In this environment, relying solely on prevention is a mathematical impossibility. The attacker only needs to be right once. The defender must be right every single time. Our organisational mindset must fundamentally change from an obsession with attack prevention to operational resilience. We must assume the breach has already happened.

The Proxy Trap: When Measurement Becomes an Illusion of Security

If the moat is dead, how do we measure our security posture? Herein lies one of the most dangerous traps for executive leadership.

In boardrooms across the globe, giant screens glow with green metrics: vulnerabilities patched, compliance checklists completed, detection rates at all-time highs. The numbers paint a picture of absolute operational security.

Your dashboard is lying to you.

Those dashboards are often the greatest vulnerability an enterprise possesses. Your cyber dashboard can look fine while your board stays blind. That happens when you track effort, not exposure, and averages, not outliers.

The explanation lies in a principle from 1975. British economist Charles Goodhart observed that any statistical regularity will collapse once pressure is placed upon it for control purposes. Marilyn Strathern later distilled this into Goodhart's Law: When a measure becomes a target, it ceases to be a good measure.

When a security team's KPI is the volume of vulnerabilities patched, they will naturally spend their time mass-patching hundreds of low-risk, easily solvable bugs. Meanwhile, a highly complex, critical zero-day that requires weeks of cross-departmental effort gets ignored because it damages their metric. The target is met, the dashboard glows green, and the enterprise is profoundly less secure.

This is compounded by the Cobra Effect: perverse incentives that produce the opposite of their intended result. Optimise purely for catching known malware signatures, and adversaries simply pivot to fileless, living-off-the-land techniques that don't trigger those thresholds. Over-optimise for a low false positive rate and you become completely blind to novel, slow-moving attacks.

Practitioner Tip

Fix Your Metrics Before You Fix Your Tools

"Never rely on a single KPI for security posture. Counterbalance 'vulnerabilities patched' with 'mean time to contain a critical zero-day.' Watch for Cobra Effect patterns: if your metrics improved but adversaries simply changed technique, you optimised the wrong thing.

Schedule regular qualitative audits alongside automated dashboards. They catch the specification gaming that dashboards miss."

The AI Asymmetry: Machine Speed vs. Human Friction

Adversaries operate unburdened by compliance, data privacy regulations, or ethical committees. They optimise for a single, ruthless metric: successful exploitation. In November 2025, an AI-orchestrated agent independently executed an entire cyber espionage operation, from reconnaissance to data exfiltration, targeting high-value technology and government entities. No human in the loop on their side.

Defenders face a fundamentally different reality. Enterprises must balance a complex web of competing factors: security, privacy, regulatory compliance, uptime, and ethical alignment. 39% of organisations cite uncertainty about AI risk as a primary adoption hurdle. 41% require human validation for AI-generated security responses, which inherently slows defensive actions².

This asymmetry is sharpened further by cyber inequity. Well-resourced enterprises are investing heavily in AI-driven defence. The majority are not. And because digital ecosystems are deeply interconnected, adversaries exploit under-resourced downstream suppliers as stepping stones to high-value targets. Your security posture is only as strong as your weakest supplier.

Practitioner Tip

Map Your Ecosystem Before an Incident Does It For You

"Involve security functions directly in procurement. Assess the security maturity of partners, not just their contractual compliance.

Conduct joint tabletop exercises with third-party suppliers so that when a downstream provider goes dark, you can sever the connection and maintain operational continuity, rather than discovering the dependency mid-breach."

The Invisible Threat: Survivorship Bias in AI Governance

During World War II, the military analysed returning bomber aircraft and proposed armoring the areas with the most bullet holes, the wings and fuselage. Mathematician Abraham Wald pointed out the fatal flaw: they were only studying the planes that survived. The aircraft that took hits to the engines never made it back to base.

We are making the same mistake with AI governance.

We review the AI agents that successfully summarise documents or route IT tickets without incident, and we conclude our governance frameworks are working. But what are we not seeing?

We are missing the rogue agents executing actions in the shadows. We are missing the AI coding assistants that, during an active code freeze, autonomously executed commands that deleted entire production databases. That incident has already happened in the real world³. We are missing the support agents that confidently hallucinated unfulfillable promises or leaked PII into public spaces.

If we only build security policy around the anomalies we can easily see, we leave the core engines of our business entirely unprotected.

Practitioner Tip

Treat 'No Incidents' as a Red Flag, Not a Green One

"Actively audit what your agents are not reporting, not just what they are. Discover and map all agents and non-human identities across your infrastructure. Most organisations significantly undercount them.

Maintain an immutable audit log of every agent action. 'No incidents observed' is a survivorship bias signal until proven otherwise."

From Static Defence to Dynamic Cyber Resilience

We have now identified three compounding traps: metrics that reward the wrong behaviour, an asymmetry that structurally favours the attacker, and a blind spot that hides the failures we most need to see. None of them can be solved in isolation. They require a fundamentally different way of thinking about security posture altogether.

The C-suite must completely abandon the notion of a secure, impenetrable perimeter. If a resourced adversary wants in, or if an autonomous agent is going to hallucinate and execute a destructive action, the initial failure is inevitable. Accept it.

The strategic shift is from trying to build an unbreakable wall to building systems that can take a hit, hold together, and get back up fast. Five imperatives define this transition:

Moving Beyond Single Proxies
To defeat Goodhart's Law, we need to abandon single-metric optimisation. Security cannot be measured solely by "vulnerabilities patched" or "alerts resolved." Instead, a slate of diverse, counterbalancing metrics can triangulate the true health of the organisation. Effective strategy involves combining quantitative data with rigorous, qualitative human audits to catch the specification gaming that automated dashboards miss. Systems should be designed to comply with broad ethical and legal alignments rather than narrow, easily manipulated targets.

Practitioner Tip

You Need More Than One Number

"Build a counterbalancing metric framework: pair every volume metric with a quality metric and a coverage metric. "Vulnerabilities patched" needs "mean time to contain a critical zero-day" and "percentage of crown jewel assets with verified coverage." When all three move together, you are measuring something real."

Redefine Identity for the Machine Age
Traditional IAM was built for humans: static roles, periodic reviews, predictable behaviour patterns. AI agents introduce short-lived machine identities that spin up programmatically, request broad access, execute a workflow, and vanish within minutes. Shift to continuous authorization, granting access only for a specific context and a limited timeframe, with constant behavioural monitoring for anomalies. If an agent acting as a financial assistant suddenly tries to access a segregated engineering database, that is a breach signal, not a quirk.

Practitioner Tip

Never Let an Agent Hold Persistent Broad Access

"Apply least-privilege and time-limited access to AI agents with the same rigour you apply to privileged human users. Flag behavioural anomalies immediately and sever the connection automatically. An agent requesting access it has never needed before is telling you something."

Expanding Right: Run-Time Enforcement
Shifting security left into design and development is necessary but no longer sufficient on its own. Because agents evaluate prompts and call tools in live environments, after-the-fact alerting is completely inadequate. If a warning fires 30 minutes after a production database has been wiped or source code has been exfiltrated, the damage is already done. Security controls must interrupt the execution chain the instant a policy violation occurs.

Practitioner Tip

Intercept, Don't Just Alert

"Define agent policies in plain language and enforce them semantically, not just syntactically. 'Never permit access to financial forecasts' should mean exactly that, regardless of how the request is phrased. Post-facto detection is not a defence at machine speed. It is a log of what went wrong."

Ecosystem Interdependence
No organisation is an island. A breach in a third-party vendor, a cloud provider, or an open-source library cascades through the entire digital supply chain. Highly resilient organisations understand they are only as strong as their weakest supplier. They involve security directly in procurement decisions, assess partner security maturity, and run joint crisis simulations so that when a downstream provider goes dark, they can contain the damage and keep running.
A Unified Architecture for Agentic Resilience
Isolated security silos are a fatal vulnerability in the agentic era. Survival requires a unified control layer that fuses visibility, governance, and recovery into a single, coherent system. Defeating survivorship bias demands continuous agent monitoring that illuminates the full attack surface, automatically discovering and mapping all agents and non-human identities across your infrastructure. That foundational visibility feeds directly into dynamic, intent-driven governance: recognising that static proxy metrics will inevitably be gamed, the system must evaluate the true semantic intent behind an action and intercept destructive tool calls in real time before they execute. And because a breached agent can still trigger cascading damage at machine speed, active defences must operate continuously during peacetime, pre-calculating safe recovery paths so that when a catastrophic error occurs, the blast radius can be rolled back instantly.

Putting This Into Practice: Rubrik Agent Cloud

This is the exact challenge we recognised at Rubrik. Enterprises were building powerful agents but deploying them took months because there was no unified way to manage the risk. No single view of what these non-human identities were doing, what data they were accessing, or how to stop them when something went wrong.

Rubrik Agent Cloud (RAC) sits directly between your enterprise applications, your AI agents, and the underlying large language models. It is not a passive monitoring tool. It is an active control layer.

Visibility first. RAC automatically discovers agents operating across your infrastructure, from Microsoft Copilot Studio to Amazon Bedrock to custom-built tools routing through the AI Gateway. It maps exactly what data those agents are authorised to access and maintains an immutable audit log of every action taken. You cannot secure what you cannot see, and RAC closes those blind spots.

Governance that actually works at machine speed. Paper policies are useless against autonomous execution. RAC lets you define policies in plain language, such as "Never permit access to financial forecasts" or "No execution of write or delete tools," and enforces them in real time through the Semantic AI Governance Engine (SAGE). SAGE evaluates the true intent behind an agent's action, not just its surface syntax, and intercepts destructive tool calls before they execute.

Recovery built in from the start. The Preemptive Recovery Engine runs continuously during peacetime, analysing metadata, verifying data integrity, and pre-calculating safe recovery paths so the heavy lifting is already done before anything goes wrong. When an incident hits, you can roll back the exact blast radius instantly and restart at full speed.

And when the system does fail, because eventually it will, Agent Rewind gives you the extinguisher, not just the fire alarm. Because RAC integrates directly with Rubrik Security Cloud's data protection layer, you can correlate a destructive agent action, such as deleting thousands of files in OneDrive or dropping a critical table, and roll back that specific data to the exact moment before the incident. Precise, surgical recovery with no downtime and no data loss.

Three Questions Every Leader Should Be Asking Now

The defining shift of the next decade is not just artificial intelligence. It is artificial autonomy. Leaders must think like their adversaries: assume systems will be breached, proxies will be gamed, and agents will eventually falter or be compromised.

Force the uncomfortable conversations by demanding answers to these three questions:

Are our metrics measuring true resilience, or creating an illusion of control? Are we tracking superficial targets that mask actual risk, or do we have real visibility into exactly what our autonomous agents are accessing and manipulating?
How do we stop our AI agents from gaming their objectives? Does our security rely on passive, post-facto monitoring, or active, run-time enforcement capable of instantly terminating an agent's execution when a policy subversion is detected?
When our defensive metrics fail, can we surgically reverse the damage? Since catastrophic machine error is inevitable in the agentic era, can we precisely roll back the blast radius to a known safe state without causing a system-wide outage?

If your organisation cannot answer these with certainty, your current metrics are masking systemic vulnerabilities. Stop relying on the illusion of secure perimeters. The organisations that get this right, the ones that build systems to unleash agents while keeping a firm grip on what those agents can actually do, will have a decisive edge. The ones that don't will find out the hard way.

Resources

Contributed by

Vigyan Jain

Advisory Sales Engineer, ANZ, Rubrik

Vigyan Jain is a technology leader with over 18 years of global experience driving strategic innovation for high-growth enterprise software companies. Currently an Advisory Sales Engineer at Rubrik, he is at the forefront of the industry’s shift towards cyber resilience while spearheading the move toward AI transformation. Throughout his career at industry giants including MongoDB, Oracle, and McKinsey & Company, Vigyan has established himself as a trusted advisor to decision makers. His leadership philosophy is rooted in authenticity and open communication, paired with a relentless drive for customer success and personal development

Introduction

By 2028, a third of enterprise applications are projected to feature agentic AI, with a significant fraction of organisational decisions made entirely autonomously¹.

The Fallacy of the Moat: Why Perimeter Defense Fails Against Autonomy

The Proxy Trap: When Measurement Becomes an Illusion of Security

If the moat is dead, how do we measure our security posture? Herein lies one of the most dangerous traps for executive leadership.

Your dashboard is lying to you.

Practitioner Tip

Fix Your Metrics Before You Fix Your Tools

Schedule regular qualitative audits alongside automated dashboards. They catch the specification gaming that dashboards miss."

The AI Asymmetry: Machine Speed vs. Human Friction

Practitioner Tip

Map Your Ecosystem Before an Incident Does It For You

"Involve security functions directly in procurement. Assess the security maturity of partners, not just their contractual compliance.

The Invisible Threat: Survivorship Bias in AI Governance

We are making the same mistake with AI governance.

We review the AI agents that successfully summarise documents or route IT tickets without incident, and we conclude our governance frameworks are working. But what are we not seeing?

If we only build security policy around the anomalies we can easily see, we leave the core engines of our business entirely unprotected.

Practitioner Tip

Treat 'No Incidents' as a Red Flag, Not a Green One

Maintain an immutable audit log of every agent action. 'No incidents observed' is a survivorship bias signal until proven otherwise."

From Static Defence to Dynamic Cyber Resilience

The strategic shift is from trying to build an unbreakable wall to building systems that can take a hit, hold together, and get back up fast. Five imperatives define this transition:

Moving Beyond Single Proxies
To defeat Goodhart's Law, we need to abandon single-metric optimisation. Security cannot be measured solely by "vulnerabilities patched" or "alerts resolved." Instead, a slate of diverse, counterbalancing metrics can triangulate the true health of the organisation. Effective strategy involves combining quantitative data with rigorous, qualitative human audits to catch the specification gaming that automated dashboards miss. Systems should be designed to comply with broad ethical and legal alignments rather than narrow, easily manipulated targets.

Practitioner Tip

You Need More Than One Number

Redefine Identity for the Machine Age
Traditional IAM was built for humans: static roles, periodic reviews, predictable behaviour patterns. AI agents introduce short-lived machine identities that spin up programmatically, request broad access, execute a workflow, and vanish within minutes. Shift to continuous authorization, granting access only for a specific context and a limited timeframe, with constant behavioural monitoring for anomalies. If an agent acting as a financial assistant suddenly tries to access a segregated engineering database, that is a breach signal, not a quirk.

Practitioner Tip

Never Let an Agent Hold Persistent Broad Access

Expanding Right: Run-Time Enforcement
Shifting security left into design and development is necessary but no longer sufficient on its own. Because agents evaluate prompts and call tools in live environments, after-the-fact alerting is completely inadequate. If a warning fires 30 minutes after a production database has been wiped or source code has been exfiltrated, the damage is already done. Security controls must interrupt the execution chain the instant a policy violation occurs.

Practitioner Tip

Intercept, Don't Just Alert

Ecosystem Interdependence
No organisation is an island. A breach in a third-party vendor, a cloud provider, or an open-source library cascades through the entire digital supply chain. Highly resilient organisations understand they are only as strong as their weakest supplier. They involve security directly in procurement decisions, assess partner security maturity, and run joint crisis simulations so that when a downstream provider goes dark, they can contain the damage and keep running.
A Unified Architecture for Agentic Resilience
Isolated security silos are a fatal vulnerability in the agentic era. Survival requires a unified control layer that fuses visibility, governance, and recovery into a single, coherent system. Defeating survivorship bias demands continuous agent monitoring that illuminates the full attack surface, automatically discovering and mapping all agents and non-human identities across your infrastructure. That foundational visibility feeds directly into dynamic, intent-driven governance: recognising that static proxy metrics will inevitably be gamed, the system must evaluate the true semantic intent behind an action and intercept destructive tool calls in real time before they execute. And because a breached agent can still trigger cascading damage at machine speed, active defences must operate continuously during peacetime, pre-calculating safe recovery paths so that when a catastrophic error occurs, the blast radius can be rolled back instantly.

Putting This Into Practice: Rubrik Agent Cloud

Three Questions Every Leader Should Be Asking Now

Force the uncomfortable conversations by demanding answers to these three questions:

Are our metrics measuring true resilience, or creating an illusion of control? Are we tracking superficial targets that mask actual risk, or do we have real visibility into exactly what our autonomous agents are accessing and manipulating?
How do we stop our AI agents from gaming their objectives? Does our security rely on passive, post-facto monitoring, or active, run-time enforcement capable of instantly terminating an agent's execution when a policy subversion is detected?
When our defensive metrics fail, can we surgically reverse the damage? Since catastrophic machine error is inevitable in the agentic era, can we precisely roll back the blast radius to a known safe state without causing a system-wide outage?

Resources

Contributed by

Vigyan Jain

Advisory Sales Engineer, ANZ, Rubrik

Vigyan Jain

Share Your Insights

Learning & Certifications

Share Your Insights

Learning & Certifications

Vigyan Jain

Share Your Insights

Learning & Certifications

Share Your Insights

Learning & Certifications