Using Machine Learning for Anomaly Detection and Ransomware Recovery

Let’s face it - preventing a ransomware attack is hard. Some may say it is near impossible even with the latest technology and a sound defense-in-depth approach. So, if there’s no surefire way to prevent an attack, recovery is the next best option. Within a ransomware recovery plan, though, there lies many decisions and nuances. For example, what should the priority be - quick recovery and return to operations, forensics to determine the cause of the attack, or minimizing data loss during recovery?

If a quick recovery is prioritized, then organizations are generally sacrificing the ability to do forensics to determine how the attack occurred and propagated which opens the door to a repeat attack. They’re also deciding to forgo determining what data was affected and what was not. This means that they’ll be recovering data that wasn’t touched during the attack and overwriting good data with older data during the recovery. By prioritizing forensics, organizations are investing in making sure the same attack cannot happen again. To do this, however, takes time, expertise, and tools which can cause the business to be down far longer than they’d like. Similarly, determining exactly what data is impacted as part of the attack can also take time for administrators to pour through logs to assess the situation.

Each one of those priorities seems to bring with it a substantial downside that makes it very difficult to choose. So, newer technologies, like Machine Learning (ML), have been sought after to address those choices. Conventional technologies use algorithms that require humans to explicitly program actions. While these algorithms can be quite complex and powerful, they can only do what the programmer has enabled them to do. In other words, if the algorithm encounters an unforeseen situation, it likely will cause an error or otherwise not reach the desired result. For example, a programmer might need to create an algorithm to analyze a batch of pictures and determine if a dog was in the picture. That programmer would have to decide what characteristics a dog has upfront. And then it is a binary choice as to whether they decide if the picture has a dog or not. Imagine all of the possibilities and the difficulties!

On the other hand, ML is a system that can learn and adapt without being given explicit instructions. ML can use algorithms and statistical models to analyze patterns in data. Going back to the example, a ML system doesn’t require that upfront definition of what is a dog and what is not. For the system to know what a dog is, the programmer can “feed” the system with pictures that contain dogs and those images are tagged by the programmer to let the system know that those are indeed pictures of dogs. This allows the ML system to “learn” what a dog is. Conversely, pictures without dogs can be fed into the system and tagged as not having dogs. The ML system then can set off on its analysis quest to find pictures that contain dogs while also continuing to refine and tune (or learn) its understanding. This continuous learning and adaptation is key.

Now, let’s take a look at how Machine Learning can help when we’re dealing with ransomware.

Applying Machine Learning Models to Ransomware Recovery

An organization’s backup data is rich with information. This includes the content itself along with metadata such as path, size, ACL details, UIDs, GIDs, and other attributes. The Rubrik Zero Trust Data Security™ platform can then feed that information into a machine learning pipeline that forms intelligent insights that streamline the ransomware recovery decision-making process. Let’s break down how Rubrik applies machine learning models to your data.

At Rubrik, we refer to backups as snapshots. These snapshots are being taken via the on-premises Rubrik Cloud Data Management (CDM) platform. Once a snapshot is completed in CDM, a filesystem metadata diff (FMD) file is created. This FMD file contains a list of entries corresponding to files that have been created, deleted, or modified, and is essentially a log of the file changes that have taken place on the backup. Instead of running the computationally intensive machine learning pipeline locally in CDM, we upload the FMD files to Rubrik Polaris to be processed by the machine learning pipeline residing in the cloud.

Rubrik Polaris is a SaaS Data Management platform that offers services such as cloud-native protection for AWS, Azure, and Google Cloud, Microsoft 365, Rubrik Polaris Radar for ransomware detection and recovery, and Rubrik Polaris Sonar for compliance.

To be clear, only FMD files and their associated metadata are transmitted to Rubrik Polaris which means that customers need not be concerned about their sensitive data being transmitted outside of their data center. So, not only do we have zero impact on the production infrastructure and applications including backups, but we get to leverage the scale-out compute performance of the public cloud with Rubrik Polaris all while ensuring a security-first approach to data.

Training the Model

Once the FMDs land in Rubrik Polaris, we leverage a deep neural network (DNN) to build out a full perspective of what is going on with the workload. The DNN is trained using supervised learning, which consists of presenting labeled data to a machine learning model to give it a training signal from which to learn. The DNN can then identify trends that exist across all samples and classify new data by their similarities without requiring human input. Remember the previous dog example? This is analogous to the system seeing more and more dog pictures and using those additional data points to become more accurate over time.

So, let’s look at how the DNN ultimately decides whether there has been a ransomware attack. The DNN analyses data via a machine learning pipeline for Rubrik Polaris Radar that consists of two models: an anomaly detection model and an encryption detection model.

These models and flow can be summarized as follows:

File System Behavior Analysis: Performs behavioral analysis on the file system metadata information by looking at items like number of files added, number of files deleted, and so forth.
File Content Analysis: If an anomaly is detected during the previous step, Radar performs an analysis to determine if there is a characteristic sharp increase in file entropy that signals a ransomware attack.

Overall, this pipeline excels at creating a historical baseline that gets refined over time. If an anomaly alert is generated, Radar can go deeper into the content of the files to look for signs of encryption and compute an encryption probability using a statistical model. This allows the analysis pipeline to compute entropy features to measure the level of encryption in the file system efficiently.

Testing Known Live Ransomware Samples

While all of the conceptual design behind Radar may sound great, how do we know it works? After all, it’s not like you are going to let loose various ransomware variants into your production environment just to see if Radar sends over an alert, right? :-)

Radar’s detection model was trained, validated, and tested against a large amount of real-world labeled data containing a diverse mix of snapshots from real-world usage, simulated usage, and snapshot changes caused by various ransomware and malicious activities.

For the machine learning pipeline, we followed a standard practice of segmenting the labeled data into 3 categories: training, validation and testing. This enabled us to ensure that the model was not overfit to the testing data; training and validation sets are used to tune the model, while testing data is used to evaluate the model on unseen data.

Conclusion

Even before today’s accelerated proliferation of ransomware, Rubrik had set out on a mission to help customers recover from ransomware. Our lead security engineer was quoted saying “With an effective backup solution, ransomware can ideally be reduced to a minor inconvenience.” Today, we can see that many of our customers are finally able to gather a clear picture into the anomalies that impact their environment regularly.

As ransomware becomes increasingly sophisticated and continues to adapt, successful attacks are more prevalent. Powered by machine learning, Rubrik Polaris Radar enables enterprises to respond quickly to the latest threats automatically and thus accelerates recovery by minimizing business disruption and data loss.

To learn more about how Rubrik can help you recover quickly from ransomware, visit https://www.rubrik.com/ransomware.

Products

Solutions

Knowledge Hub

About Us

Applying Machine Learning Models to Ransomware Recovery

Training the Model

Testing Known Live Ransomware Samples

Conclusion