Under the Hood - How Rubrik Works
Rubrik is an intelligent Data Management stack where each layer scales and is independently resistant to failures. Designed to run on-prem or on the cloud, the stack is anchored by Infinity (API and deep app awareness), Cerebro (the “brains”), and Atlas (cloud-scale file system built from scratch).
Chris Wahl dives into how Atlas was built with Adam Gee (former staff engineer on Colossus, Google’s File System).
The interface between the outer world and Cerebro. APIs execute SLA policies throughout the system and deliver granular control to users.
The brains of Rubrik. Comprised of Blob Engine and Distributed Task Framework. Abstracts a data control plane detached from any underlying infrastructure.
Cloud-scale file system designed to be masterless and self-healing. Works with Cerebro to provide instant recovery.
Creating a Control Plane, Detached from Any Infrastructure
Rubrik Blob Engine is a distributed version control system, detached from any underlying application and infrastructure (e.g., storage, on-prem, cloud). The Blob Engine can orchestrate data from on-prem to cloud, cloud to cloud, cloud to on-prem. It provides core data management services, including immutability, deduplication, retention, replication, and archival.
Chris Wahl takes you behind the scenes to see how the brains of Rubrik was built with founding lead engineer Fabiano Bolteho (former Technical Director at Data Domain).
The Rubrik Blob Engine is designed to deliver instant access to data to meet today’s demands for recovery, test/dev, and analytics. It dynamically evaluates how to minimize fragmentation and latency within the snapshot to achieve near-zero recovery times, especially for applications that require higher quality of service(e.g., Gold vs. Silver SLA). See how Rubrik instantly recovers a snapshot from 90 days ago.
The Blob Engine maintains a mapping between content ID and an usable representation of the corresponding content until deletion (could be stored in Atlas file system). Everything is stored in an immutable format (immune to Ransomware).
Recover a Snapshot from 90 Days Ago
The Rubrik Blob Engine is designed to deliver instant access to data to meet today’s demands for recovery, test/dev, and analytics. It dynamically evaluates how to minimize fragmentation and latency within the snapshot to achieve near-zero recovery times, especially for applications that require higher quality of service (e.g., Gold vs. Silver SLA).
To deliver dramatically lower RTOs through “Live Mounts”, Rubrik exploits its distributed system DNA. When a “Live Mount” is initiated, Rubrik issues parallel requests to the cluster nodes and underlying storage to read the distributed data concurrently. Unlike traditional rehydration, Rubrik employs parallel synthesis of the data, accelerating the time in which data can be presented back to the system for recovery or test/dev purposes.
Automating Policies with a Distributed Task Framework
The Distributed Task Framework globally assigns and executes tasks across the system in a fault tolerant and efficient manner.
It enforces the activities to uphold the assigned SLA policies on a daily and long-term basis. Once a SLA policy is set, it strategizes to meet these set goals for data retention, replication, and archival.
For example, if a user has defined prioritization requirements within the SLA – this database is mission-critical, it will constantly perform data efficiency checks (data consolidation, compression, deduplication) to ensure it recovers quickly.
Founding engineer Jon Derryberry discusses the benefits of a declarative policy engine (what data should exist and where in the system).
Mobilize your data anywhere
Unlike legacy solutions, Rubrik has integrated an API-first architecture from Day 1 and consumes the same APIs published and offered to users.
Rubrik’s APIs are designed to work in two ways:
Abstract complexity with APIs built into a self-learning system that operates in an efficient manner (like adaptive throttling or automatic detection of workload characteristics to minimize impact to production).
Deliver granular control for customers to employ workflows best suited to their environments.