Why Legacy Backup Became a Business Risk
As a global fashion and lifestyle retailer with a large store footprint and a rapidly expanding online flagship, this customer cannot afford disruption to core systems or e-commerce platforms. Any prolonged outage or failed recovery would be immediately visible to customers, because online checkouts, store systems, and digital experiences are tightly interconnected.
When I first engaged with their team, they were relying on IBM Spectrum Protect as their enterprise backup solution. Over time, as new platforms, databases, and applications were added, the environment evolved into a highly complex architecture that was:
- Difficult to operate and maintain across a heterogeneous, globally distributed environment
- Dependent on deep, product specific expertise
- Increasingly misaligned with modern requirements for cyber resilience and rapid recovery
They also wanted backup to fit into an automation first operating model. Limited integration with existing infrastructure as code tooling, plus a lack of centralised visibility, made it hard to onboard new workloads consistently and to prove they could recover quickly from a cyber incident.
Practitioner Tip “Before you touch tooling, write down what good looks like for you in three bullets, for example automated onboarding, predictable recovery, and one place to manage policies. Use that as the filter for every platform decision.” |
Why We Moved to Rubrik for Cyber Resilient Backup
Our goal was to simplify backup operations while raising the level of cyber resilience for business critical applications, retail systems, and online presences worldwide. After a structured evaluation and proof of concept, we chose Rubrik because it addressed several pain points at once:
- Unified, simplified management
Rubrik provides a single, intuitive interface to manage backups across all workloads, including physical, virtual, database, and cloud. This significantly reduces operational overhead and removes the need for multiple, disconnected tools. - Built in security and resilience
Rubrik’s immutable architecture, together with capabilities such as anomaly detection, threat hunting, threat monitoring, and Data Security Posture Management, creates a strong defensive layer against ransomware and insider threats. - Automation and extensibility
The platform’s API surface, together with Ansible, allowed us to fully automate host onboarding and SLA assignment. Backup became part of existing infrastructure as code workflows instead of a separate manual process. - Global scalability with consistent SLAs
Rubrik supports a multi cluster global deployment with centralized management. SLA Domains drive protection policies, so we can enforce consistent standards across a broad, distributed environment. - Cloud integration and long term retention
With Rubrik integrated into the customer’s cloud strategy, including Cloud Vault for long term retention, they gained an offsite, logically separated copy of data as part of standard workflows, which improves cyber resilience and simplifies archival.
What made Rubrik stand out was the combination of simplicity, security, automation, and global scale in one platform.
What We Actually Protect
The environment is broad and heterogeneous, typical for a mature retailer with strong physical and digital channels. It includes:
- Operating systems and infrastructure
- Linux, Windows, AIX
- VMware vSphere
- NetApp storage
- Databases (Oracle, SAP HANA, MaxDB, MSSQL)
- Application domains
- Business critical enterprise applications
- Retail and store systems that support in store operations
- Ecommerce and digital experience platforms that power online shops and web presences worldwide
- Cloud and SaaS
- Microsoft Azure for IaaS and PaaS workloads
- Microsoft 365
- Security and monitoring
- Splunk for centralized logging and monitoring
- Rubrik security features for anomaly detection, threat hunting, threat monitoring, and posture analytics
Rubrik sits across this entire stack as the unified backup, recovery, and security layer. Each major application group, including enterprise systems, store infrastructure, and global online shops, is mapped to an SLA that reflects its criticality and recovery expectations.
Inside the Architecture: SLAs, Automation, and Cloud
From the outset, we designed the Rubrik deployment around a streamlined policy model and automation. Instead of hundreds of individual backup jobs, we built around a small number of SLA Domains that define:
- Backup frequency
- Local retention on Rubrik clusters
- Archival and replication behavior, including cloud based long term retention
Key design choices:
- Few, well defined SLAs
Each SLA covers an entire application group or tier of criticality, for example e-commerce production versus internal systems. This minimises configuration drift and makes it easier to show that similar workloads receive the same level of protection.
Practitioner Tip “Aim for as few SLAs as you can reasonably justify. If you end up with more than a handful of core policies, you are probably encoding exceptions, not standards.” |
- Automation via Ansible and Rubrik APIs
Together with the customer, we developed Ansible playbooks that:- Discover new hosts and workloads, including virtual machines, physical servers, and cloud workloads
- Register them with Rubrik and assign the appropriate SLA automatically
- Log onboarding and policy assignment into Splunk for visibility and audit
- Automation via Terraform for Cloud
- Embedded security policies
Security features such as anomaly detection and threat monitoring are enabled at the platform level, so every newly onboarded workload automatically benefits from the same cyber resilience posture. This model allows the customer’s team to focus on strategic improvements and new projects instead of manual backup administration.
How We Rolled It Out in Practice
The deployment itself was straightforward. Once the first Rubrik clusters were in place, the smoothest part of the project was host and workload onboarding, because we had fully automated that step with Ansible. Through infrastructure as code, we could bring Linux, Windows, AIX systems, databases, and virtual machines under protection without repetitive manual work.
The most challenging area was cluster sizing and capacity planning for such a diverse workload mix. Different application domains and data growth patterns required careful analysis and iterative testing. During the proof of concept and early rollout, we worked closely with the customer and Rubrik engineers to balance utilization and performance while keeping headroom for future Azure workloads such as additional storage and DevOps related services within Rubrik Security Cloud.
By the end of the rollout, we had a set of right sized clusters, a small number of well designed SLAs, and an automated onboarding process that keeps new workloads protected with minimal effort.
Before and After: What Changed Day to Day
In fashion retail, customers never see the backup platform, but they immediately feel it if orders cannot be placed, stores cannot transact, or digital campaigns send shoppers to an unavailable site. Making backup predictable and policy driven has become part of how this retailer protects both revenue and brand experience.
If You Are in a Similar Situation, Here Is What I Would Do
For teams running similarly complex, global environments, especially in retail or ecommerce, there are a few clear takeaways:
- Start with automation. Design your Rubrik rollout around APIs and tools like Ansible so onboarding and policy assignment are automatic from day one.
- Keep SLAs simple but comprehensive. A small, well thought out set of SLA Domains is easier to manage and audit than many special case policies.
- Invest time in sizing and modelling. Do the hard work upfront on capacity planning, data growth, and RPO or RTO requirements to avoid performance surprises later.
- Integrate security from the beginning. Turn on anomaly detection, threat monitoring, and posture management early so they become part of normal operations, not an afterthought.
Think of backup as a platform. When you treat Rubrik as an API driven platform that is tied into your monitoring and automation stack, it becomes a strategic enabler for resilience and compliance, not just an insurance policy.
What Comes Next
The customer is already planning the next phase of their Rubrik deployment, focusing on building on the efficiency, security, and automation achieved so far. Key future initiatives include:
- Expansion to additional cloud workloads: Integrating Azure Blob storage and Azure DevOps pipelines to extend SLA-driven protection and orchestrated recovery to new cloud native services.
- Advanced security use cases: Using Rubrik’s security stack for more proactive anomaly response and automated threat remediation to further strengthen cyber resilience.
- Data analytics and insights: Achieving actionable insights into data usage, compliance, and recovery readiness through deeper integration with Splunk and broader use of Rubrik APIs.
- Global scale optimization: Ensuring consistent performance and efficient utilization with ongoing capacity and performance tuning as the environment grows.
The next phase aims to move the backup platform from simply protecting data to becoming an operational foundation that actively supports business agility and resilience
Contributed by

Marcel Keil
System Engineer


