Architecture

Recovering Fast from Ransomware Attacks: The Magic of an Immutable Backup Architecture

Rubrik - Recovering Fast from Ransomware Attacks: The Magic of an Immutable Backup Architecture - Recovering Fast from Ransomware Attacks: The Magic of an Immutable Backup Architecture

Summary

Ransomware has been blasting my news feeds on a daily basis for years. Each article details the story of an organization that can no longer access their business critical data. Where the attackers have crippled their victims by encrypting access to production files and storage devices. According to the Emsisoft Malware Lab, ransomware attacks in 2019 impacted at least 966 government agencies, educational establishments and healthcare providers at a potential cost in excess of $7.5 billion. Whilst cyber security teams have invested in a myriad of protection tools, extortionists continue to find new mechanisms to encrypt organizations’ data.

Backups are one of the most – if not the most – important defense against ransomware. But if subject to corruption, attackers will use it against you. Advanced ransomware is now targeting backups – modifying or completely wiping them out – eliminating your last line of defense and driving large ransom payouts. Rubrik’s uniquely immutable filesystem natively prevents unauthorized access or deletion of backups, allowing IT teams to quickly restore to the most recent clean state with minimal business disruption. This blog walks you through our one-of-a-kind immutable architecture and robust security controls that harden your data from cyber attacks.

The Effects of Ransomware

Ransomware is designed to encrypt your data so that it is no longer usable. Often, this means encryption of data held on primary storage to overwhelm IT and requires massive recovery efforts from tape or other archives. Additionally, lower level encryption of the Master Boot Record (MBR) or other operating system level encryption is used to prevent booting and other common operations. For virtualized environments, the shared data storage used to host virtual machines is a primary target, such as with NFS-backed datastores. This can effectively bring down critical services in an organization. The attackers then demand a ransom to unlock the data so that services can be resumed. 

How Rubrik Has Helped Customers

Several customers have successfully survived a ransomware attack through the use of our immutable solution and instant recoveries as part of their defense in depth strategy. 

For example, the City of Durham detected a ransomware attack on Friday, March 6, and their leaders credited their quick response to Rubrik’s backup solution. Durham Mayor Steve Schewell said, “The city can be assured that our backups are very good because they’re immutable. [This means that] they could not be consumed by ransomware.” As a result, they were able to quickly restore critical city services, including access to 911. In addition, Kerry Goode, Durham CIO, emphasized that core business systems, including ones that manage payroll, were back online by the start of the business week. 

Kern Medical Center discovered a large ransomware attack in June 2019 when users reported they couldn’t access their systems. They were able to recover 100% of the impacted systems protected by Rubrik within minutes, including recovering their business-critical electronic medical record system. CTO Craig Witmer said, “After the incident, we were so impressed that we moved more of our legacy systems to Rubrik and are fully confident that Rubrik’s immutable backups will protect us from future incidents.”

How Do You Recover from a Ransomware Attack?

Data backups can be an effective way to restore data that has been locked/encrypted by the attack. However, what if your backup data is also encrypted or deleted by a ransomware attack? How do you ensure that your backup data is not vulnerable to these attacks? 

The Key Is Immutable Backups

While primary storage systems need to be open and available for client systems, your backup data should be immutable. This means that once data has been written it cannot be read, modified, or deleted by clients on your network. This is the only way to ensure recovery when production systems are compromised.

This goes well beyond simple file permissions, folder ACLs, or storage protocols. The concept of immutability needs to be baked into the backup architecture so that no security exposure can tamper with the backups. 

Rubrik Is Designed for Immutability

Rubrik uses an immutable architecture by combining an immutable filesystem with a zero trust cluster design in which operations can only be performed through authenticated APIs.

Rubrik’s approach is in contrast with other data management systems using general purpose storage that use standard protocols such as NFS or SMB to advertise their availability to a wide assortment of clients. We often find that data management solutions using general purpose storage have limited or ineffective means for securely transacting data and, in some cases, leave files in their native format while allowing clients to read the backup data directly. This is a breach of confidentiality and puts extra burden on the customer to secure the storage independent of their data management solution.

An Immutable Distributed Filesystem

One of our first design decisions was to construct Atlas, an immutable Filesystem in Userspace (FUSE) that was largely POSIX-compliant. This provides tight controls over which applications can exchange information, how each data exchange is transacted, and how data is arranged across physical and logical devices. Atlas is custom designed to be a distributed and immutable file system for writing and reading data for other Rubrik services.

Immutability is provided across two layers: the logical layer (Patch Files, Patch Blocks) and the physical layer (Stripes, Chunks). The dynamics between these two layers will be explained further in the next few sections.

The Logical Layer
All customer data brought into the system is written into a proprietary sparse file called a Patch File. These are append-only files (AOFs), meaning that your data can only be added to the Patch File while it is marked as being open. All of the customer snapshot and journal data is held within Atlas, which enforces the use of Patch Files in the underlying directory structure. This powerful filesystem will refuse writes at the API level that are not append-only, such as situations in which the write offset value does not equal the file size. Atlas has total control over how and where customer data is written.

If your backup data has been modified, then it’s essentially worthless. We solved for this by ensuring that checksums are generated for each Patch Block within a Patch File. These checksums are computed and written to a Fingerprint File stored alongside the Patch File. Rubrik always does a fingerprint check before committing any data transformations. This ensures that the original file remains intact with forced validation during read operations.

In order to counter a ransomware attack, the original, validated data must be restored from backup. Rubrik routinely verifies the Patch Blocks against their checksums to ensure data integrity at the logical Patch Block level. Patch Files are not exposed to any external systems or customer administrator accounts. This ensures that meticulous care is taken to restore exactly what you originally stored in a backup.

In a traditional approach, administrative access is granted to the filesystem – especially when using general purpose storage – which presents further confidentiality and integrity challenges and gives “Leakware” another attack vector. In addition, many other solutions simply restore whatever data is located in the backup folder or volume without performing validation and other due diligence on the data.

The Physical Layer
While the logical layer focused on data integrity at the file level, the physical layer is focused on writing customer data across the immutable cluster to achieve data integrity and data resiliency. To do this, Patch Files are logically divided into fixed length segments called Stripes. As Stripes are written, the AOF computes a Stripe level checksum, which it stores within each Stripe Metadata.

Stripes are further divided into physical Chunks stored on physical disks held within the Rubrik cluster. Activities such as replication and erasure coding occur at the Chunk level. Just as with Patch Files, as each Chunk is written, a Chunk checksum is computed and stored in the Stripe Metadata alongside the list of chunks. These checksums are periodically recomputed as part of Atlas’ background scan by reading the physical Chunks and comparing against the checksums in the Stripe Metadata. Additionally, if a data rebuild is needed, the resiliency provided by erasure coding is automatically leveraged in the background.

Zero Trust Cluster Design

Traditional approaches to cluster security often rely on a “full trust” model in which all members of the cluster are able to communicate with one another. In some cases, this includes root level authority, no mutual authentication checks, and the ability to read or modify your customer data that is held within the filesystem. This creates a weak surface area when designing a defense in depth architecture; if backup data can be compromised, there is no path to restoration when disruption occurs.

Secured Cluster Communications
Each cluster has some number of nodes that need to communicate with one another. This means we need to validate each node that wants to exchange data. For many solutions, there is little to nothing protecting node-to-node communication. At Rubrik, all of our intra-node and inter-cluster communication, as well as communication with external applications, use the TLS protocol with certificate-based mutual authentication for secure communication.

Rubrik does not use insecure protocols, such as NFS or SMB, to relay information within the cluster; all communication is performed through secure and trusted channels. In fact, all our internal communications use TLS 1.2 with strong cipher suites and Perfect Forward Secrecy (PFS).

Each Rubrik cluster shipped to a customer uses strong, randomized passwords on a per-node basis. There is no concept of a “admin/admin” style of default local authentication that is easily searchable on the web to add an attack vector.

Systems Hardening Standards
There are numerous other elements in position to protect the integrity of the system through internal hardening standards. Here are a few that help combat ransomware:

Authenticated APIs

Rubrik adopted an API-first design as part of the architecture. We require authentication to all endpoints that are used to operate the solution. Authentication can be handled via credentials or secure token. This includes environments using our Role-Based Access Control (RBAC) or Multi-tenancy features to logically divide the roles, features, and resources that are under management. Rubrik’s CLI, SDKs, and other tools consume the API and are held to the same security requirements.

API endpoints that control the underlying behavior of the system require an additional level of authorization that can only be supplied from a certified technical support engineer. This prevents a malicious actor from being able to alter the behavior of a Rubrik cluster.

Conclusion

Numerous resources on the Internet advocate for a Defense in Depth strategy. This combines efforts across employee education and enablement, rapid deployment of patches, and a solid backup and recovery plan. In this post, I described how Rubrik uses a combination of data immutability and a zero trust cluster design to build a great product for protecting and recovering data. We help organizations further strengthen their ransomware response strategy with our application Radar to increase visibility into the scope of attack. This allows organizations to quickly pinpoint which applications and files were impacted and where they reside to further minimize business impact. Learn more about Radar here.

Many of our customers turn to Rubrik on their worst day. They need to be able to reliably recover from ransomware attacks to ensure minimal downtime of their critical services. A product with a truly immutable architecture provides our customers the peace of mind that when they need to, they can always access the data to recover from such debilitating attacks.