I’ve had the opportunity to speak with many users about their plans for public cloud vendors and disaster recovery (DR). Specifically, users ask me about how they can use AWS or Azure as the DR target for their on-premises environment. This is also a topic that Tim Carr addresses in his recent Gestalt IT blog post.
In the first of two blog posts, we will examine the different DR options in public cloud.
Why Use Public Cloud for DR?
The traditional approach to DR requires significant investment of time and resources. At minimum, users must consider how they would replicate their primary infrastructure to a secondary site. That secondary site needs to be procured, installed, and maintained. During normal operations, the secondary site will typically be under-utilized or over-provisioned.
The cost of such an investment is beyond the means of many companies. Even for companies with the means, DR is seen as a sunk cost that delivers little return quarter over quarter. However, not having an adequate DR strategy is also something no company can afford.
The public cloud offers a way for companies of all sizes to build DR sites with little upfront costs through a pay-as-you-go model.
Options for Disaster Recovery in the Cloud
Every major public cloud vendor offers multiple options for building a DR site using their cloud. AWS, for example, offers four options or scenarios that can also be created with the other public cloud vendors, comes in at a different price point and delivers a different Recovery Time Objective (RTO) and a different Recovery Point Objective (RPO).
Companies can choose the option that best meets their RTO and RPO requirements and budget. In general, public cloud enables customers to build solutions with better RTO and RPO at a lowered cost than a secondary DR site.
Backup and Restore
Traditionally, companies have used off-site backup tapes as their primary means for restoring data in the event of a disaster. This typically involved retrieving tapes from cold storage and recovering data when the primary facility has been restored or when the tapes have been sent to a cold secondary site only turned on when a disaster has occurred.
Companies have started to leverage public cloud storage services such as Amazon S3 and Azure Blob Storage as alternatives to archiving tape to an off-site facility. Not only is this a more cost-effective solution, it delivers better RTO and RPO since the data is already in the cloud where it can be used to launch a DR site on-demand.
Source: White paper: “Using Amazon Web Services for Disaster Recovery” – 2014
There are various approaches for transferring data from the user’s on-premises infrastructure to the public cloud. These include migration tools specific to a particular cloud vendor, as well as vendor neutral data management platforms such as Rubrik.
In a disaster, users create cloud resources to restore data to and launch new server instances/VMs to run production workloads in the cloud.
Pilot Light
The Pilot Light option is named after the constantly-on gas heater pilot light that is used to quickly light the furnace. With this approach, a minimal copy of the production environment is maintained in the cloud. Core components whose state must be maintained and updated, such as a production database, run continuously in the cloud and are synced regularly with production. Servers in the cloud can be provisioned but turned off until a disaster or server images can be maintained for launching instances/VMs.
Source: White paper: “Using Amazon Web Services for Disaster Recovery” – 2014
Compared to the Backup and Restore option, the Pilot Light scenario offers a better RTO since the core components are already running in the cloud and servers are already provisioned or ready to be provisioned. It also offers better RPO since core services are regularly updated and synced with production. However, the cost is typically higher.
Warm Standby
The Warm Standby option requires a scaled down copy of production to be provisioned and run continuously in the cloud. Stateful core components are also updated and synced regularly with production. A subset of servers, found in production, run continuously as instances/VMs in the cloud and can be scaled up as needed.
Source: White paper: “Using Amazon Web Services for Disaster Recovery” – 2014
Compared to the previous two options, the Warm Standby scenario offers a better RTO since the core components are already running in the cloud and critical servers are already provisioned and running. In a disaster, production traffic for critical workloads can be redirected to the cloud while additional instances/VMs are launched to take on additional workloads. The Warm Standby option also offers better RPO since core services are being regularly updated and synced with production. The cost is higher than the earlier two options since more resources are provisioned and continuously running.
Hot Site
Similar to the Warm Standby option, a copy of the production environment runs continuously in the cloud. But in the hot site scenario, a copy of the full production environment runs in the cloud. This allows for immediate failover during a disaster, with the cloud provisioned to run the same amount of workload as production. In addition, if core components are being updated synchronously, then the cloud can be used for production, along with the user’s on-premises infrastructure, in an active-active setup.
Source: White paper: “Using Amazon Web Services for Disaster Recovery” – 2014
This option has the best RTO and RPO since the user is running an exact replica of the on-premises infrastructure in the cloud. As expected, it also has the highest cost, particularly if core components for both the on-premises and cloud environments are being completely synced.
Having laid the groundwork, our next blog post will look at how Rubrik can be leveraged to build out and enhance a user’s cloud DR option of choice.
Want to learn more? Read The Challenges of DR: Achieving Near-Zero RTO.