Tagged in

SLA

Rubrik -  - Rubrik, How Are My SLAs Doing?

Product

Rubrik, How Are My SLAs Doing?

At Rubrik, we hold a hackathon every year so that engineers have an opportunity to work on their creative ideas. Hackathon projects can include product features, productivity related improvements, and exploratory concepts. This year was our second hackathon, which was held at the Computer History Museum in Mountain View, CA. There was a tremendous amount of excitement built up to the day of the hackathon. As it neared, we were constantly reminded to think of ideas, form teams, and chalk up a plan of execution. Our customers love Rubrik’s simplicity. Rubrik’s UI is minimalistic, user-friendly, and intuitive. The question I asked myself was how can I redefine how customers interact with their Rubrik cluster? The answer – a voice based conversational interface that would make our product even more easy to use. Let’s begin by answering our customers’ most important question – “How are my SLAs doing?” So how did I set out to do this? First, I figured out what building blocks to use. Amazon Lex enables one to quickly and easily build sophisticated, natural language, conversational bots. With Amazon Lex, I could define intents and train the bot to identify these intents. A simple example of intent is to know the…
Rubrik -  - Do No Harm: How We Schedule Backups Without Impacting Production

Architecture

Do No Harm: How We Schedule Backups Without Impacting Production

One of our principal tenets is to “do no harm” to production systems. The act of taking a backup requires several resources from the primary environment, including CPU, I/O, and network bandwidth. Rubrik’s objective is to minimize the impact to production and VM stun as much as possible. We achieve this for both virtual and physical workloads by using the following three methods: Flash-Speed Data Ingest: One way we adhere to our “do no harm” principle is through a high-speed data ingestion engine that can easily handle large volumes of data. Data enters Rubrik through the flash tier and lands on spinning disks, minimizing the time that we spend communicating with production infrastructure. Policy-driven Automation. The user specifies when and how often Rubrik takes backups by setting up SLA policies to signal the times when production systems have the least load. This SLA-based backup scheduling makes creating a backup faster, taking advantage of idle cycles on a primary system. Intelligently Distributed Workflow Management: Lastly, Rubrik only takes backups when it detects that the production system is not loaded. Users can set thresholds for certain metrics, such as CPU utilization and I/O latency, globally, or on a per-object basis. Rubrik monitors…
Rubrik -  - Here’s Why You Should Shelve Backup Jobs for Declarative Policies

Architecture

Here’s Why You Should Shelve Backup Jobs for Declarative Policies

Changing out legacy, imperative data center models for the more fluid declarative models really gets me excited, and I’ve written about the two ideas in an earlier post. While the concept isn’t exactly new for enterprise IT – many folks enjoy using declarative solutions from configuration management tools such as Puppet – the scope of deployment has largely been limited to compute models for running workloads. The data protection space has largely been left fallow and awaiting some serious innovation. In fact, this is something I hear from Rubrik’s channel partners and customers quite frequently because their backup and recovery world has forever been changed by the simplicity and power of converged data management. To quote Justin Warren in our Eigencast recording, backup should be boring and predictable rather than exciting and adventurous because the restoration process failed, and you’re now responsible for missing data. That’s never fun. Thinking deeper on this idea, it brings me to one of the more radical ideas that a new platform brings: the lack of needing to schedule backup jobs. Creating jobs and telling them when exactly to run, including dependency chains, is the cornerstone of all legacy backup solutions. As part of their…
Rubrik -  - Pure Rubrik Goodness

Product

Pure Rubrik Goodness

As Pure//Accelerate approaches, one of my favorite aspects of winning solutions comes to mind. It’s a virtue that transforms products into MVPs, rather than the drama generators so common on the court and in the field. What is it? Simplicity Businesses have enough knobs and pain points with tier-1 Oracle/SAP deployments and SQL, SharePoint and Exchange farms. The last thing they need is for storage and data protection to jump on the pile. That’s why enterprises need Pure Storage and Rubrik. From the ground up, Pure and Rubrik have simplicity in their DNA. If you have a FlashArray on the floor, then you already know the freedom and ease it brings to storage infrastructure. Gone are the days of tweaking with RAID sets or tuning LUNs to squeeze out a few performance points. With a few cables and a vSphere plugin, Pure serves up datastores and gets out of the way. Rubrik brings the same unobtrusive value to data protection and is the perfect pairing to Pure. From rack & go to policy-driven automation to instant recovery, Rubrik drives straight to the point and with beautiful simplicity.Rack & Go Rack & Go The first thing that stands out with Rubrik is its lean footprint–it doesn’t eat…
Rubrik -  - Contrasting a Declarative Policy Engine to Imperative Job Scheduling

Architecture

Contrasting a Declarative Policy Engine to Imperative Job Scheduling

One of the topics du jour for next-generation architecture is abstraction. Or, more specifically, the use of policies to allow technical professionals to manage ever-growing sets of infrastructure using a vastly simpler model. While it’s true I’ve talked about using policy in the past (read my first and second posts of the SLA Domain Series), I wanted to go a bit deeper into how a declarative policy engine is vastly different from an imperative job scheduler. And, why this matters for the technical community at large. This post is fundamentally about declarative versus imperative operations. In other words: Declarative – Describing the desired end state for some object Imperative – Describing every step needed to achieve the desired end state for some object Traditional architecture has long been ruled by the imperative operational model. We take some piece of infrastructure and then tell that same piece of infrastructure exactly what it must do to meet our desired end state. With data protection, this has resulted in backup tasks / jobs. Each job requires a non-trivial amount of hand holding to function. This includes configuration items such as: Which specific workloads / virtual machines must be protected Where to send data and how to store that…
Rubrik -  - Managing and Monitoring SLA Domains at Global Scale

Architecture

Managing and Monitoring SLA Domains at Global Scale

In my previous post, I went into the complexities that funnel into building Service Level Agreements (SLAs) that exist between consumers and providers of an IT service. This friction can be greatly assuaged by decoupling the agreed upon policy’s intent from the actual execution of backup jobs. It allows administrators to abstract away much of the low-end fuss required to build and maintain data protection, instead focusing on adding value at a more strategic level across the organization. Let’s now move the story forward to discuss how consumers can easily determine if their SLAs are being honored. At a high level, SLA Domains are constructed using Recovery Point Objective (RPO) and retention values. The RPO is essentially asking how much data loss the consumer is willing to tolerate, while the retention input determines where the provider will store data (on-premises or elsewhere). To understand SLA compliance, it’s important to look at the entire set of backup jobs to ensure all facets of the RPO are being met for an application. This goes beyond looking at the number of total backups held by the system, as an RPO is often expressed as a quantity of hourly, daily, weekly, monthly, and yearly…
Rubrik -  - Decoupling Policy from Execution with SLA Domains

Architecture

Decoupling Policy from Execution with SLA Domains

Successful enterprise architects are able to pull functional design elements from key stakeholders to abstract requirements, constraints, and risks. Much of this work involves translating business needs into technology decisions and then deciding upon the right vendor solutions to provide for the design. In this blog post series, I’m going to focus on addressing Service Level Agreements (SLAs) to ensure that the business is equipped with the runway it needs to tackle operational challenges and protect applications. Many organizations that I’ve consulted with were forced to take a good, hard look at their SLAs (or lack thereof) in order to craft a strategic plan for the future. At the heart of any quality SLA is fairness. Both parties – the consumer and the provider of a service – must agree on a mutually beneficial statement for long term success. The end goal is to abstract the minutiae of a technical design away from the consumer. Such as this WordPress platform: I really don’t concern myself with the back end infrastructure, I just want to consume the service and know that it’s being protected. An SLA is a method for me to define guard rails around data loss and availability while…