CompanyMay 21, 2026 11 min read

Anatomy of an Unstructured Data Attack

 

Unstructured data accounts for 92% of your enterprise data. We’re talking about petabytes of PDFs, emails, slide decks, images, and more. 

While there’s a ton of unstructured data, it’s often treated as a secondary priority. It’s a victim of the obfuscation by volume problem: there is so much unstructured data that many organizations assume it is too big to protect, too difficult to classify, and too slow to recover.

When organizations haven't classified their data, mapped its business value, or established clear ownership and retention policies, they lose the ability to govern it. And data you can't govern, you also can't protect or recover.

A recent webinar Rubrik held with IDC revealed that this blind spot is exactly what makes unstructured data so attractive to attackers. 

To understand the stakes, we’ll dissect a real-world 6-petabyte ransomware attack and show how a single root compromise can paralyze an entire NAS infrastructure, the tactical visibility required to pull off a high-stakes 36-hour recovery sprint, and why the common belief that "replication is backup" is perhaps the most expensive misunderstanding in modern IT.

 


Fact: 92% of your organization’s data is unstructured—PDFs, emails, images, and audio files.

Join Rubrik’s virtual workshop, Governing Your Unstructured Data, and learn how you can protect your unstructured data with Rubrik Security Cloud.


Part 1: The Vulnerability

How does a root compromise on a NAS filer lead to a 6-petabyte headache?

Network-attached storage (NAS) environments are a particularly acute data management blind spot. Files accumulate over years across shared drives and file systems, typically lacking defined ownership, sensitivity labels, or metadata information. Before this attack, most organizations with a NAS estate of this scale couldn't answer basic data management questions, such as: What percentage of this data is active vs. archived? Who owns it? Does it contain PII or PHI?

That absence of data intelligence is what makes the blast radius so large when something goes wrong.

For an eDiscovery firm, data availability is the pulse of the business. Their core value lies in searching and retrieving data for active legal cases. Any downtime threatens court deadlines and risks irreparable brand damage.

While the ultimate recovery was successful, the initial breach was devastating. A root compromise was triggered by a vulnerability within their primary NAS filer.

The resulting destruction was total and symmetrical across the environment:

  • Primary Impact: Attackers deleted all active data by formatting the system.

  • Secondary Impact: The system was locked down with encryption, and a ransom note was left behind.
     

With six petabytes compromised across both primary and replicated NAS systems, this case exposes a critical blind spot. The sheer scale of unstructured data means that an attack can consume the entire infrastructure, presenting IT teams with a recovery challenge that appears nearly impossible.

 

Part 2: The 36-Hour Sprint

When you are facing six petabytes of encrypted data, you cannot simply hit "restore all" and expect the business to function. Recovery at scale requires a hit list: a prioritized understanding of what data matters most right now. Since this eDiscovery firm was working on active cases they couldn’t treat all six petabytes as equal. They had to restore, as quickly as possible, the data that was most important to the priority customers.  

They had the visibility to identify which customers had active legal proceedings and which datasets were critical for that week’s business operations. So within the first 36 hours, the team restored access to these critical, high-value datasets. The remaining data (the archives and less urgent files) took 10 days for a full recovery.

This 36-hour win was only possible because they had mapped their data value to business outcomes with a proactive data management practice: they knew which data was active, which belonged to priority clients, and which could wait. That kind of intelligence is the core of effective unstructured data management, and it proved to be the difference between a crisis and a catastrophe. Unfortunately, that level of data intelligence is still rare.

As Jennifer Glenn from IDC noted, while 72% of companies feel they have mapped their intellectual property, only 33% have effectively mapped their sensitive PII and PHI. Without that mapping, a strategic sprint to restore the most important data can turn into a 10-day crawl for everything in the system.

 

Part 3: Replication is Not Backup

The most dangerous assumption in unstructured data management is that replication equals protection. Many organizations believe that by having a primary and secondary filer (DR site), they are safe. This attack proved the opposite.

Replication is designed to mirror data. If a user deletes a file on the primary site, the system faithfully replicates that deletion to the secondary site. In a ransomware scenario, if an attacker gains root access and formats the primary system or encrypts the files, those instructions are often replicated instantly to the secondary site.

As seen in this case, both the primary and secondary filers were compromised because the vulnerability existed across both.

This confusion between replication and backup often reflects a deeper gap in data management strategy: without defined retention tiers, recovery point objectives, and clear policies for what data must be immutably preserved, organizations default to 'more copies equals safer', a shortcut that ransomware attackers have learned to exploit

To achieve true cyber resilience, organizations must move beyond replication and embrace:

  • Backups that cannot be modified, encrypted, or deleted by attackers, even with root credentials.
     

  • Logical or physical separation that prevents an attack on the production environment from reaching the backup data.
     

  • The ability to search through billions of files to identify exactly what was impacted so that recovery can be surgical rather than total.
     

The blind spot of unstructured data is growing every day. Nowadays, attackers specifically target backups (a tactic seen in 50% of modern attacks). The question isn't whether you have a copy of your data. It’s whether you have the visibility and resilience to bring it back when it counts.

Building that visibility starts before any attack, with foundational data management work: knowing what data you have, where it lives, how sensitive it is, and which of it your business cannot survive without.

 

The Bottom Line

Ready to get a clear view into your unstructured data blind spot?

Join us live for a guided workshop on unstructured data protection. 

Learn how to achieve 10x the performance of legacy recovery.

For the more visual learners, take this self-guided tour to get a demo of the technology that saved the eDiscovery business from days and weeks of sluggish recovery.

 

Related Articles

Blog by This Author