Tagged in


Rubrik -  - The Hacks Must Go On


The Hacks Must Go On

Last week, Rubrik hosted its 5th Annual Hackathon. In the past, this meant that the entire Engineering Org would all get together off site so we could easily share ideas and collaborate. This year, like everything else, the hackathon was transformed to be fully-virtual in order to fit the new normal of the socially distanced world we’re now living in. Hackathons are intended to spur innovation, and this year the event itself innovated and adapted to the challenges du jour, generating outstanding participation, ideation, and creativity. What Is the Rubrik Hackathon? Before going into details around this year’s event, I’d like to share what the hackathon means to us at Rubrik. I think this message from my good friend and Rubrik’s first-ever Sales Engineer Eric Chang sums it up pretty well: Hackathon day is my favorite day of the year. Even though we were stuck at home, I really enjoyed this. Thank you all! Our hackathon serves many purposes. For starters, it has been an innovation lab for products and infrastructure improvements that have translated into real-world revenue for the company and productivity gains for us engineers. To name a few, the seeds of Polaris GPS, Ransomware Recovery, and Cloud…
Rubrik -  - Building an Error Message Framework


Building an Error Message Framework

In a fast-growing engineering team, it becomes less and less realistic for a single engineer to understand every behavior behind a product. In fact, relying on an individual to make the product consistent often leads to disastrous results, especially for features that get less attention. A well-designed framework alleviates these problems by shaping collective developer behavior and promoting better designs, and ultimately makes a better product for our customers. Recently, I was commissioned to revamp the error message framework in Rubrik’s CDM product. Although this framework seems small compared to others, I learned a few lessons that can be applied to building frameworks of any size, including: Understand your customers, which include both end users and developers Automate as much as possible, leaving fewer chances for human errors Rules do not matter if they are not enforced by code A framework should help scale the engineering organization Building a framework means changing a culture What Went Wrong At Rubrik, we pride ourselves in prioritizing our customers’ feedback and using their insights to shape our product roadmap. This was recently true when we received feedback on the quality of our error messages. For reference, here is a real-life message given to…
Rubrik -  - Understanding Cloud Costs


Understanding Cloud Costs

Your company runs cloud infrastructure on AWS and it wants to reduce the spend.  You’ve already got a Savings Plan in place, you’ve right sized your instances, but when you look in Cost Explorer, the spend is still too big to believe. What do you do? If you take one thing away from this post let it be this – your objective should be to understand your costs. While reducing costs is important and valuable, reduction should come as a consequence of understanding. The value of understanding cloud costs is something we at Rubrik discovered firsthand on our journey to reduce our bills. Below we’ll share anecdotes demonstrating our approach to achieving an understanding of our cloud spend, concrete examples of both successes and failures, and the learnings we picked up along the way.  These stories will include how:   Accurate attribution led to a speedy 10x reduction in a major account. Dogfooding led to a bug discovery, a fix, and corresponding savings. Using APIs and histograms allowed us to come to grips with reality. Process failure can be as important as software bugs. Rearchitecting resulted in a 600x reduction in CI/CD cost. At the core of success is a model…
Rubrik -  - Scala: Concise, Clean Code for Humans


Scala: Concise, Clean Code for Humans

Let me ask you a simple question: which do you think is a more natural way of thinking? I am going to go home and take a nap. My present location is “office,” and my state of wakefulness is “awake.” I am going to change my location to “home” and then change my state of wakefulness to “asleep.” The answer is probably a unanimous and resounding “the first one!” But when we write code, it is almost always an example of the second one. Here at Rubrik, while expanding the frontiers of Cloud Data Management, we are also passionate about the psychology of programming and shortening the learning curve for new developers. So, we look for innovative methods to reduce the cognitive load that programmers deal with. That’s where Scala’s magic shines! In this post, I am going to walk you through how we leverage Scala’s expressiveness to write cleaner, leaner, and more meaningful code. For this example, we’ll write a simple simulation for modeling backup operations and how they consume space. For starters, let’s simulate, using a toy program, what happens to occupied storage space when we take a snapshot: [crayon-5f8f1c7092c64095882862/] No code is done without unit tests, right?…
Rubrik -  - Erasure Coding or: How Rubrik Doubled the Capacity of Your Cluster


Erasure Coding or: How Rubrik Doubled the Capacity of Your Cluster

At Rubrik, we’re big believers in data protection. But until we’re able to take consistent snapshots of our brain state and upload them to the promised hierarchical neural interconnect, we’re going to focus on backing up the more traditional machines — the ones whose smooth functioning will enable this cause. Any complete backup solution needs a distributed, scalable, fault-tolerant file system. Rubrik’s is Atlas, which made the switch from triple mirrored encoding to a Reed Solomon encoding scheme during our Firefly release. To help you understand the motivation behind this change, this post introduces erasure coding and compares the two methods. What is Erasure Coding? Suppose we want to store a piece of data on a fault-tolerant and distributed file system. In this case, the loss of any single drive should not result in data loss. The only way to achieve fault tolerance is through redundancy, which refers to storing extra information about the data across different drives to allow for its complete recovery in the event of a failure. The more redundancy we add, the greater the fault tolerance. However, the cost of redundancy is increased storage overhead. Every file system needs to make this tradeoff between availability and overhead. At Rubrik, the…
Rubrik -  - Introducing Crystal, Rubrik’s Intuitive User Interface


Introducing Crystal, Rubrik’s Intuitive User Interface

Rubrik reinvented data management with the user in mind. By approaching data management from a user’s perspective, we’ve distilled a complex process into a click or swipe. Setup a policy to backup data from multiple sources with a user experience similar to an iPhone. Create a replication and archival schedule to public or private cloud in 30 seconds. Recover entire virtual machines, databases, and files instantly. This is end-to-end data management. Simplified. The Consumer Experience of Enterprise Crystal is Rubrik’s intuitive data management platform that makes all of the above scenarios a reality. It transforms complicated enterprise backup, disaster recovery, data archival, and copy data management workflows from a burden to a joy. One of the reasons we built Rubrik was to bring a consumer-grade experience to enterprise software. To provide the simplicity, easy of use, and pain free delight of popular consumer products, such as Facebook, Google, and Dropbox, we created a team composed of engineers from both consumer and enterprise, including the builders of Google Maps, Apple iOS, and Box.What is Crystal? What is Crystal? Crystal is composed of two principal components: the Crystal UI and Crystal REST API. The Crystal UI focuses on building products with usability…
Rubrik -  - 10 Reasons Why You Should Intern at a Startup


10 Reasons Why You Should Intern at a Startup

All of the software engineering students out there are crazy about internships/jobs at top tech giants — Google, Facebook, Microsoft, etc. Have you ever considered interning at a startup? If not, then read on. This post might change your perspective about startups.I interned at Rubrik. Rubrik is a backup and storage startup based in Palo Alto in California, USA. They are a team of about 50 engineers who are top notch in their respective fields. The engineers at Rubrik are super talented and they have experience of building the real tech. I interned at Rubrik. Rubrik is a backup and storage startup based in Palo Alto in California, USA. They are a team of about 50 engineers who are top notch in their respective fields. The engineers at Rubrik are super talented and they have experience of building the real tech.I will list out the things that I really liked about Rubrik: I will list out the things that I really liked about Rubrik: Freedom to choose project: I was initially given an option to choose from 1 of the 2 suggested projects. I didn’t quite like both of them and so, the team members helped me to come up…
Rubrik -  - Meet Cerebro, the Brains Behind Rubrik’s Time Machine


Meet Cerebro, the Brains Behind Rubrik’s Time Machine

Fabiano Botelho, father of two and star soccer player, explains how Cerebro was designed. Previously, Fabiano was the tech lead of Data Domain’s Garbage Collection team. Rubrik is a scale-out data management platform that enables users to protect their primary infrastructure. Cerebro is the “brains” of the system, coordinating the movement of customer data from initial ingest and propagating that data to other data locations, such as cloud storage and remote clusters (for replication). It is also where the data compaction engine (deduplication, compression) sits. In this post, we’ll discuss how Cerebro efficiently stores data with global deduplication and compression while making Instant Recovery & Mount possible. Cerebro ties our API integration layer, which has adapters to extract data from various data sources (e.g., VMware, Microsoft, Oracle), to our different storage layers (Atlas and cloud providers like Amazon and Google). It achieves this by leveraging a distributed task framework and a distributed metadata system. See AJ’s post on the key components of our system. Cerebro solves many challenges while managing the data lifecycle, such as efficiently ingesting data at a cluster-level, storing data compactly while making it readily accessible for instant recovery, and ensuring data integrity at all times. This is what…