Architecture

Rubrik -  - Converged Data Management Unwrapped – API-Driven Architecture

Converged Data Management Unwrapped – API-Driven Architecture

Welcome to the third post in the Converged Data Management series: API-Driven Architecture. To best understand how this property impacts the modern data center, it’s important to take a step back and view how today’s services – meaning the various number of servers that construct an application offered to the business or its clients – are created, consumes, and retired. This is often referred to as lifecycle management.As a service is instantiated, there are a number of lifecycle milestones to reach. These include the details needed to request a service from a portal or catalog, the provisioning tasks to build the service, various care and feeding events while the service runs, and ultimately the retiring and archival aspects of the service. More often than not, these milestones are wrapped into various orchestration engines and front-ended by a Cloud Management Portal (CMP) for administrative and tenant based consumption. In other scenarios, there are still automation workflows being utilized by way of point-and-click scripts and customization tasks. An API-Driven Architecture comes into play as IT professionals work to progress a service through the various lifecycle milestones along with other third party tools and infrastructure stacks. As technical teams attempt to piece together…
Rubrik -  - Converged Data Management Unwrapped – Infinite Scalability

Converged Data Management Unwrapped – Infinite Scalability

In the second part of my series on Converged Data Management, I’m putting another property under the microscope – Infinite Scalability. The underlying premise is that the fabric that is providing data management can be deployed in a shared-nothing manner with a limitless architecture focused on linear growth. Woah, what does that all mean?Let’s pick these ideas apart, one by one. A shared-nothing system is one built of a series of nodes that have no dependency upon each other. If a node fails, or parts of a node fail, the fabric remains healthy and operational without any negatively impacting penalties. Ideally, this architecture is expanded beyond the node itself, expanding out to the enclosure, rack, or even entire data centers. Contrast this to systems that are reliant upon dependencies and use alternative tricks to hide or protect them – load balancers, failover clustering, and so forth. If a failure occurs, performance suffers due to the need to ingest data into a central choke point – such as a master server, quantity of proxy nodes, or a database instance. Availability is also put at risk, especially considering that most components in a dependency chain have only a single failover counterpart because of…
Rubrik -  - Converged Data Management Unwrapped – Software Converged

Converged Data Management Unwrapped – Software Converged

As a technologist, I find a certain amount of joy comes from learning about what’s new in the market. This could be as simple as a new code or hardware release for a solution that has been around for a while, driving me to absorb the new features. Or, on the other end of the spectrum, it could be an entirely new way of thinking about how to tackle a challenge within the data center by learning from startups who are looking to shake things up with an interesting idea. A common pain point is figuring out exactly what it is that the young company does, as often the technical bits get blended in with a bunch of marketing jargon and buzzwords (see my buzzword bingo parody video) that dilute the transfer of information. In this series, I’m going to get a little nerdy about a relatively new term that is hitting the market, Converged Data Management, and dive into the five important properties that build into this solution. The goal is to provide a clear view into each feature with real meat on the bone, along with why and how the solution is important in today’s modern data center,…
Rubrik -  - Managing and Monitoring SLA Domains at Global Scale

Managing and Monitoring SLA Domains at Global Scale

In my previous post, I went into the complexities that funnel into building Service Level Agreements (SLAs) that exist between consumers and providers of an IT service. This friction can be greatly assuaged by decoupling the agreed upon policy’s intent from the actual execution of backup jobs. It allows administrators to abstract away much of the low-end fuss required to build and maintain data protection, instead focusing on adding value at a more strategic level across the organization. Let’s now move the story forward to discuss how consumers can easily determine if their SLAs are being honored. At a high level, SLA Domains are constructed using Recovery Point Objective (RPO) and retention values. The RPO is essentially asking how much data loss the consumer is willing to tolerate, while the retention input determines where the provider will store data (on-premises or elsewhere). To understand SLA compliance, it’s important to look at the entire set of backup jobs to ensure all facets of the RPO are being met for an application. This goes beyond looking at the number of total backups held by the system, as an RPO is often expressed as a quantity of hourly, daily, weekly, monthly, and yearly…
Rubrik -  - Introducing Atlas, Rubrik’s Cloud-Scale File System

Introducing Atlas, Rubrik’s Cloud-Scale File System

Rubrik is a time machine for virtualized infrastructure. It periodically takes snapshots of an enterprise’s virtual machines (VMs), allowing for instant recovery to a previous point in time. In order to efficiently store the entire history of a VM, it represents each snapshot as a delta, with each delta containing only the data which has changed since the previous snapshot. Atlas is Rubrik’s Cloud-Scale File System at the foundation of the storage layer. It was designed specifically to support the time machine paradigm. Conceptually, Atlas stores a set of versioned files: each VM is a file, and each snapshot a version of that file. Internally, Atlas stores each snapshot as its own file, with the first being a full copy of the VM and each subsequent an incremental delta from the previous. One of the reasons we chose to create Atlas was to have fine grain control of replica placement in order to co-locate related data. Almost all operations in our system involve operating on the content of a VM. In order to make these operations efficient, Atlas places entire file replicas on specific nodes, rather than spreading the data blocks backing the file randomly throughout the system. Put another…
Rubrik -  - Decoupling Policy from Execution with SLA Domains

Decoupling Policy from Execution with SLA Domains

Successful enterprise architects are able to pull functional design elements from key stakeholders to abstract requirements, constraints, and risks. Much of this work involves translating business needs into technology decisions and then deciding upon the right vendor solutions to provide for the design. In this blog post series, I’m going to focus on addressing Service Level Agreements (SLAs) to ensure that the business is equipped with the runway it needs to tackle operational challenges and protect applications. Many organizations that I’ve consulted with were forced to take a good, hard look at their SLAs (or lack thereof) in order to craft a strategic plan for the future. At the heart of any quality SLA is fairness. Both parties – the consumer and the provider of a service – must agree on a mutually beneficial statement for long term success. The end goal is to abstract the minutiae of a technical design away from the consumer. Such as this WordPress platform: I really don’t concern myself with the back end infrastructure, I just want to consume the service and know that it’s being protected. An SLA is a method for me to define guard rails around data loss and availability while…
Rubrik -  - Building an Infinitely Scalable Time Machine

Building an Infinitely Scalable Time Machine

A lot has changed in how we manage our personal data (e.g. photo albums) over the last decade. It used to require a lot of effort to store and protect our albums – you’d need to transfer photos from a camera to a computer, burn them on CD drives or external hard disks, and buy more storage as space ran out. Accessing albums was cumbersome – you had to locate the right media and manually browse through the albums to locate the desired photo. Today, we deal with none of the complexity from years past. We take pictures with our phone, and our photos are automatically backed up and managed in the cloud, which never runs out of space. Any photo can be accessed instantly from any device, anywhere in the world. Unfortunately, businesses still deal with all the complexity of managing their data that we, as consumers, faced in the past. In fact, business data is far more complicated than photo albums. Businesses have databases that are constantly changing and need to access both current and past versions of their data. Furthermore, businesses have stringent requirements around the availability and security of their data. Storing data is hard as…
Rubrik -  - The Need for Scale

The Need for Scale

For almost two decades, I’ve been advising clients on their storage and backup and recovery system architectures with solutions that are limited in scalability. The need for scale only becomes more paramount over time and often sneaks up on you. Data generation and consumption grow at exponential rates. Data volume and conversely, the need for scale, is driven by three leading factors: Company maturation – Information scales alongside company growth. Economies of scale – Economies of scale bring cost savings. For example, Managed Services Providers (MSPs) are aggregators of multiple SMB environments. Data explosion – Big Data, mobile cloud era, IoT. Need I say more? Now try to imagine AT&T Stadium at max capacity (80,000 people) after a Cowboys game, with just one exit hall. We’ll never get out of here. No thank you. This is what it’s like if you’ve been using current backup and recovery solutions available in the market today. I have spent the last few years setting up multiple media agents, servers, global dedupe pools that don’t dedupe across each other because they are silo’d, or even separate resource pools for dedupe data and snapshot data (it’s the same data!). It was the same story over, over, and…

    Close search icon

    Contact Sales