Reducing the Impact of Application Stun
Application stunning during the snapshot process is a topic that often bubbles up in customer conversations on data protection for VMware environments. To level set, application stun goes hand-in-hand with any snapshot operation. VMware stuns (quiesces) the virtual machine (VM) when the snapshot is created and deleted. Cormac Hogan has a great post on this here.
Producing a snapshot of a VM disk file requires the VM to be stunned, a snapshot of the VM disk file to be ingested, and deltas to be consolidated into the base disk. If you’re snapping a highly transactional application, like a database, nasty side effects appear in the form of lengthy backup windows and application time-outs when the “stun-ingest-consolidate” workflow is not efficiently managed.
When a snapshot of the base VMDK is created, VMware will create a delta VMDK. Write operations are redirected to the delta VMDK, which expands over time for an active VM. Once the backup completes, the delta VMDK needs to be consolidated with the base VMDK. Longer backup windows lead to bigger delta files, resulting in a longer consolidation process. If the rate of I/O operations exceeds the rate of consolidation, you’ll end up with application time-outs.Rubrik was designed to dramatically diminish the effects of application stunning when backing up VMware environments. This includes:
- Flash-optimized, parallel ingest that linearly scales as more nodes are added to the cluster (faster ingest)
- Reduction of data hops due to convergence of traditionally disparate software and hardware, such as backup software, proxy servers, deduplicated storage, etc. (simpler architecture)
- Rubrik’s VSS Provider offered through VMware Tools (better application integration)
- Rubrik’s consolidation process, which throttles the number of operations to avoid overwhelming the ESXi hosts (better task management)
As a result, our customers can take VM-level snapshots of applications with high change rates and at greater frequencies for more granular recovery. Rubrik helped Red Hawk Casino eliminate the effects of application stun for their 4TB SQL databases, allowing them to finally do VM-level backups and accumulate more recovery points with a more granular RPO. You can check out the Red Hawk case study here and download the App Consistent Snapshot data sheet here.
Stay tuned – in our next post, we’ll go more in depth on how we take application consistent snapshots.