The majority of Rubrik customers take advantage of our CloudOut capability to archive backup data to a public cloud service provider, often as a replacement for tape. Cloud offers better response times than tape in the event that you must restore data from archive. This solution provides a cost-effective approach and reliable long-term retention.
Through the lens of Zaffre Fashion Group (Zaffre), our own field-tested fictional enterprise based on how customers commonly use our product, we’ll examine how a CloudOut to Amazon S3 solution can be architected.
Solution Overview
As part of their cloud-first strategy, Zaffre leverages CloudOut to support compliance, eliminate tape, and meet long-term retention requirements. Cloud adoption efforts began in 2017 by selecting AWS as the cloud archive location to store on-premises data for a regulatory period.
The long-term retention location is specified as a cloud archive within the Rubrik SLA Domain. This means that once a workload is protected, Rubrik automatically ensures that:
- Snapshots that have exceeded their local retention threshold are marked for archive.
- Snapshots and metadata are uploaded to the archive location.
- Snapshots are marked for expiry once the archive retention threshold is exceeded.
- Archive consolidation efficiently manages and removes expired snapshots.
The following SLA Domain configuration demonstrates how Zaffre’s IT staff meets the service level agreement retention requirements.
The solution covered in this post meets all of the stakeholder’s functional requirements, including:
- Recovery time from archive should not exceed the maximum tolerable downtime for applications.
- Solution should allow for a single file to be retrieved from cloud archive to minimize egress charges.
- Amazon S3 bucket designated for long-term retention should not be used for any other purpose.
- Precautions should be taken to limit access to the Amazon S3 bucket.
- Data should be encrypted both in-flight and at-rest.
- Resource deployment, configuration, and state should be managed using infrastructure as code principles.
Let’s take a look at how Zaffre meets these requirements. A sample configuration for this solution can be found on GitHub.
Solution Architecture
In order to standardize the automation framework across multiple cloud platforms as well as Rubrik, Zaffre selected Terraform as their automation tool. Keep in mind that there are a number of automation tools that could be used to fulfill this requirement. A later post will cover the automation options in more detail.
Amazon S3 was selected for the storage type, allowing users to instantly search for files across all snapshots, including those in the cloud archive. This design decision delivers rapid file-level recovery for all points in time. The CloudOut design is modular and capable of being provisioned any number of times within a single region and across regions. Therefore, it is important to tag everything and ensure a consistent naming standard.
Zaffre applies tags to map cloud resources to cost allocation and associated business logic, such as a Rubrik SLA Domain. The naming scheme designates in which region this IaaS layer is deployed, as indicated by the usw1 suffix. The mabel prefix indicates which application stack will be archived to the Amazon S3 bucket. The tag specifies use case and environmental information.
The following graphic demonstrates a logical overview of the CloudOut solution. The required design elements for a CloudOut solution include an IAM User, policies limiting access, data encryption keys, an Amazon S3 bucket, and a Rubrik cluster.
Zaffre’s access design adheres to the principles of least privilege. This means that the AWS IAM service account used in our design is specifically dedicated for CloudOut operations. The IAM policies contain only the exact permissions required for these operations. The account is limited in scope to:
- Access resources within a single region.
- Access only the specific resources needed for CloudOut for a single Rubrik cluster.
- Encrypt/decrypt operations, but cannot manage the key itself.
This diagram provides a visual representation of the access infrastructure.
It’s important to take all precautions to secure an Amazon S3 bucket. Zaffre has selected to enable the Block Public Access policy to prevent any other access to the bucket. Versioning is not supported by Rubrik because it does not allow snapshot deletion and is therefore not enabled on the Amazon S3 bucket. This screenshot demonstrates the bucket configurations:
For more CloudOut hardening considerations, be sure to check out the Security Hardening Rubrik CloudOut for AWS technical white paper.
Additional Considerations
Rubrik CloudOut performs client-side encryption of the archival data in one of two ways: through AWS KMS integration or using a user-created RSA key. For this solution, Zaffre selected to use AWS KMS, opting for customer-managed encryption keys (CMK). Because AWS KMS does not reveal the key-encryption keys that are being used to encrypt the data to the user, the only known entity to the user is the CMK ID. The design also incorporates key aliases to further abstract away the underlying CMK.
AWS KMS provides granular separation of duties, enabling the choice of which IAM user manages key administration versus routine usage. The service account, mabel-iam-svc-usw1, has user access to the CMK, mabel-kms-s3-usw1. This means that the service account can be used for cryptographic operations only; key rotation is automated using a separate workflow that rotates all keys on a scheduled basis. For more information about encrypting data in an AWS S3 bucket, take a look at “Encrypting Your Data in the Cloud: Rubrik CloudOut with Amazon S3.”
Zaffre’s solution includes the use of Rubrik Archive Consolidation, which efficiently deletes expired snapshots by consolidating expired snapshot sub-chains into unexpired snapshots. This feature must be enabled within the advanced settings of the cloud archive location. In the same location, users must also configure Cloud Compute settings by specifying the VPC, Subnet, and Security Groups where Rubrik Bolt instances can be launched on-demand to perform consolidation. Bolt is a lightweight on-demand instance spun up to provide compute capabilities in the cloud to run Archive Consolidation.
In this case, Zaffre selected to use a private subnet within an existing VPC used for IaaS services supporting the Mabel application stack. However, a new security group for Bolt is provisioned specifically to allow ports TCP 2002 and 7780, as shown in the following image.
The last design consideration is around monitoring, logging, and auditing. Despite best efforts to adhere to least privilege principles, attackers will always try to access data. CloudTrail can be used to monitor the S3 bucket used as an archive location. Server Access Logging is also enabled on the bucket for a more log/text-based delivery.
Conclusion
This post walked you through a sample solution architecture for CloudOut to Amazon S3. CloudOut provides automated data lifecycle management for long-term retention of data in S3. Rubrik’s ability to do file-level retrieval from archive and features like archive consolidation ensures that Zaffre’s design leverages cloud in the most efficient and economical manner. Lastly, and most importantly, all of this can and should be automated.