Do No Harm: How We Schedule Backups Without Impacting Production
One of our principal tenets is to “do no harm” to production systems. The act of taking a backup requires several resources from the primary environment, including CPU, I/O, and network bandwidth. Rubrik’s objective is to minimize the impact to production and VM stun as much as possible. We achieve this for both virtual and physical workloads by using the following three methods:
- Flash-Speed Data Ingest: One way we adhere to our “do no harm” principle is through a high-speed data ingestion engine that can easily handle large volumes of data. Data enters Rubrik through the flash tier and lands on spinning disks, minimizing the time that we spend communicating with production infrastructure.
- Policy-driven Automation. The user specifies when and how often Rubrik takes backups by setting up SLA policies to signal the times when production systems have the least load. This SLA-based backup scheduling makes creating a backup faster, taking advantage of idle cycles on a primary system.
- Intelligently Distributed Workflow Management: Lastly, Rubrik only takes backups when it detects that the production system is not loaded. Users can set thresholds for certain metrics, such as CPU utilization and I/O latency, globally, or on a per-object basis. Rubrik monitors the performance of all resources required for a backup, and once they fall within these user-allowed thresholds, we know it’s safe to take a backup.
For VMware environments, we check for CPU utilization of the VM relative to how much it has been allocated by the host, as well as the I/O latency of its datastores. Other VMs sharing these resources affect the performance of the VM in question, and vice-versa: The act of taking a backup can affect the performance of other VMs. Rubrik executes these workflows transparently with no action required by the user while still keeping him informed of this process through a system of alerts and notifications.
Let’s take a look at an example. In Rubrik, the user sets a SLA policy, defining frequency and retention. Let’s say the SLA frequency is the following:
- Every 4 hours and keep for a day
Additionally, the user specifies to take backups only in this window:
- Every day at 2am – 8am
We then schedule a backup for every 4 hours unless we are not in an allowed window. Additionally, if we detect the production system has a high load, we reschedule the backup for when it minimizes impact on production.
Our data management system is the “brains” that handles lifecycle management for mission critical workloads from ingest to retirement. We built intelligence into the system to monitor the load of the production system and perform backup jobs at times that minimize the impact to production. Deploying a data management platform with policy-driven automation gives users ease of management – create a SLA policy in minutes by selecting frequency of backups, their retention, and some hints as to when production environments are least busy. Let us handle the rest.
For more information on how Rubrik simplifies backup for physical and virtual environments, check our data sheet here.