Managing data on cloud platforms is a natural and inevitable byproduct of migrating applications and file storage to the cloud. Having your data on one or more cloud platforms, however, creates greater data management and security challenges. Compared to traditional on-premises infrastructure, the cloud introduces complexity and new risks that must be addressed if you want to protect your cloud data and maintain its integrity.
What is cloud data management? Before answering that question, it’s worth taking a moment to answer a more basic question: What is data management in general? Basically, there is a set of practices and processes involved in making sure that your data is available, backed up, and protected from malicious actors who might damage its integrity.
Cloud data management is analogous to traditional data management, but with some key differences. If your data is held in cloud storage (sometimes on multiple cloud platforms) it may not be linked to instances of data stored on-premises or in private clouds. For example, you may run your enterprise systems and databases on-premises, but use the cloud for data backup and disaster recovery.
Cloud data management may involve the use of a purpose-built solution. Such a solution is able to handle the challenges of managing and securing data across multiple platforms and on-premises instances. This may be a data storage management solution, but not all of these may have the features to deal adequately with cloud-based data.
In this age of cloud migration and digital transformation, cloud data management is more important than ever. It enables you to get the most out of cloud computing and cloud-native applications, as well as your cloud-based backups and data repositories without exposing your data to the unique risks of the cloud.
How is cloud data different from traditional data that’s stored on-premises? The data itself is not different. A byte is a byte, regardless of where it’s hosted. The differences arise in the contexts of access, diversity of platforms, and responsibilities.
In a traditional data environment, you most likely have your data on a few platforms that you control completely, e.g. Oracle running on Linux in a data center you either own or rent. Your people handle every aspect of managing and securing that data.
In contrast, cloud data may be spread across multiple tiers of Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP) storage offerings. The data stored on these cloud platforms may include structured databases and unstructured files such as email, documents, media files and “data lakes” for analytics.
Each cloud platform has its own data management and data security features and settings. Unlike with a traditional on-premises setup, the cloud provider is responsible for managing and securing the storage infrastructure. However, you are responsible for securing your data stored in the cloud (often referred to as the shared responsibility model). There can be confusion over who takes care of what—leading to gaps in data management and security that can negatively affect the integrity and availability of your data.
The scope of cloud data storage can also be more complex than it is with traditional storage. You might run a private cloud, for example, which will have its own data management and security requirements but different assignments of responsibility. You could also have data stored in private clouds run by specialized vendors, e.g., enterprise resource planning (ERP) data stored in a managed SAP instance hosted in the data center of a third-party provider
What emerges in such hybrid architectures is a need for simple, universal visibility into all your cloud data. You will want to know where your data is, how it’s being stored, and how it’s being protected. And, you do not want to be dealing with this challenge manually through multiple point solutions. That’s asking for trouble. You will ideally want visibility—and control—over all of your cloud data through a unified solution that provides a “single pane of glass” for cloud data management and security.
Done well, cloud data management confers a range of benefits on an organization.
Most fundamentally, you can enjoy the many benefits of the cloud without suffering any ill effects on your data. You get the cloud’s scalability and agility and the economic benefits of not having to invest capital in on-premises infrastructure or incurring ongoing expenses for energy, cooling, maintenance and personnel. And your data can remain safe and well-managed.
Business resilience is another major benefit of proper cloud data management. Cloud data management companies and products typically offer automated backups and disaster recovery. They enable backup administrators to oversee and automate backups and test data recovery processes across multiple clouds and on-premises instances. If the solution provides immutable data snapshots, as Rubrik does, this can be a significant countermeasure against ransomware attacks. Updates to data storage solutions can be similarly automated.
Data quality should improve, as well. One of the risks that comes with spreading data across different cloud platforms is the potential for disaggregation of data sets. It may be difficult or impossible to know how to reconcile two duplicative records. For example, if a customer changed his address and that change is reflected in cloud database A, but cloud database B has the old address, can you be sure that database A has the most up-to-date record? A cloud data management solution can help you deduplicate and avoid data quality and integrity issues. This may be part of a master data management (MDM) feature set.
The cloud also has the potential to disrupt the data lifecycle. For example, if your company has a policy to delete data that is more than seven years old, it is necessary to enforce that policy across all clouds. If you don’t, you could risk having old data in storage that no one realizes is still available. Compliance problems and legal liability could result. An effective cloud data management solution helps you avoid these negative outcomes.
Cloud data management also plays a role in digital transformation. As transformation deploys software, devices, and data in a range of new configurations and hosting environments, it creates data management and security challenges. For example, if your digital transformation project involves the use of numerous Internet of Things (IoT) devices, those devices may store data on convenient cloud platforms. It’s up to you to manage and secure that data, a task that can be difficult to do without the right cloud data management tools.
On a related front, digital transformation may depend on being able to analyze diverse data sets that exist on different cloud platforms. You might want to gather data from disparate clouds onto a cloud data warehouse for that purpose. Without being able to manage a consolidated dataset—including using a digital security posture management solution to monitor data wherever it lives—you may miss critical insights in your analysis and your transformation efforts may fall short.
Each of the major public cloud platforms offers its own cloud data management features. While the feature sets vary, at a minimum they each enable some level of data management and security on their own platform. Some (like Microsoft Purview on the Azure cloud) provide multi-cloud and on-premises data governance. For example, Purview can map data in your clouds and determine where you are hosting various data assets. From there, you can get insights into how you are managing sensitive data.
As good as such a solution may be, however, there can be a downside to using one cloud platform to manage data on another. Incompatibilities may arise, or members of your team may not be able to handle the learning curve if they are used to working with a different cloud platform.
Making cloud data management work requires careful implementation of a purpose-built toolset. One of the first steps is data discovery. This process tells you where your data is. For Carhartt, the legendary clothing maker and Rubrik customer, data discovery involved the use of Rubrik’s Sensitive Data Monitoring solution. Carhartt’s IT team used this tool to conduct precision surveillance of its data. They were able to find out which sensitive data could be compromised in an attack, and where it resided.
For cloud data management to function, there has to be a relatively simple way to integrate all the moving parts. This means APIs; And solutions like Rubrik are designed with an “API-First” architecture, built to connect cloud platforms, databases, data governance solutions, data security tools, and more. Cloud data management solutions also often contain a unified mechanism for service level agreement (SLA) policy definition and enforcement.
In the case of Carhartt, the solution comprised an integrated set of Rubrik products, including Anomaly Detection, Threat Hunting, and Sensitive Data Monitoring—integrated with Microsoft Sentinel. This setup provided Carhartt with a centralized view of all the data for all its systems.
Cloud data security is one of the more pressing cloud data management challenges. While data security and data management are separate workloads, they are closely linked. Indeed, some cloud data security tasks involve data management and vice versa. For example, understanding where your data is and how access to it is managed relates directly to both data governance and security.
Privacy is one area of cloud data security where cloud data management tools can be essential for success. This is a compliance issue much of the time. For instance, GDPR and CCPA require companies that store consumers’ personal identifiable information (PII) to track the data they have, do their best to protect it from a breach, honor requests for its deletion, and so forth. A cloud data management solution can be critical to realizing these security and compliance objectives.
Cloud data security also factors into best practices for managing data in the cloud. An effective cloud data management solution will automate the discovery and classification of sensitive data such as PII. It will also map user identities with data assets to enable data access rules. It will facilitate the enforcement of privacy policies and orchestrate controls that protect data.
Managing data in the cloud presents a host of challenges. Concerns about data privacy should be at the top of the list. A cloud data management solution should offer functionality that addresses this concern, e.g., by discovering PII and other forms of data, such as health records, that create data privacy risk.
Data loss is another potential problem that comes up, especially in hybrid or multi-cloud environments. Without careful coordination of management of cloud data sets, it is possible to overwrite data and permanently lose it. A cloud data management solution will prevent this from occurring through analysis of duplicate records and comparable processes.
Outages and data loss through cyberattacks are related challenges. A ransomware attack, for example, can destroy data by encrypting it until a ransom is paid. The further risk, however, is that the decryption process will not work. (Who said hackers are honest, anyway?) Then, your data is simply gone. Rubrik offers a solution with its immutable backups and integrates with Zscaler to deliver data loss prevention capabilities. They cannot be encrypted or modified in any way, so they provide a good countermeasure to the ransomware threat.
Cloud data management can also help you deal with challenges related to the costs of data storage and management. Fees for cloud data storage can start to add up, particularly when you’re paying to store duplicative data sets on multiple clouds. Backups present a similar challenge if you aren’t staying on top of what you’re backing up. A cloud data management solution should help you determine an approach to cloud data storage that has optimal costs.
The future of cloud data management is unfolding in front of us right now. One exciting development is the integration of artificial intelligence (AI) and machine learning into the cloud data management workload. It’s early in its lifecycle but the potential is already evident for AI to improve automation and decision-making for cloud data managers. For example AI might make it more effecient to select and storage cloud data sets for use in data analytics.
Edge computing, particularly the emergence of edge clouds, also promises to change the way we manage data in the cloud. The edge inverts the cloud data management paradigm, requiring us to manage data at numerous small data center sites, versus a few large cloud platforms. It puts pressure on cloud data management solutions to provide decentralized data management.
Your data is likely on multiple clouds as well as hybrid architectures. This reality makes managing and securing your data more challenging. But it is imperative that you take cloud data management seriously. You face risks related to compliance and security, data integrity, availability, and plain old high costs if you don’t.
A cloud data management solution can help. It automates processes like data discovery, showing you where your data is stored (including sensitive data or data that’s subject to regulatory scrutiny). Backup and restore are part of the picture, too, as cloud data is vulnerable to ransomware attacks (in 2023, cloud tenants reported that they were targeted by a cyberattack every month), among other factors that can affect data availability. The future looks promising, with technologies like AI potentially making cloud data management and security more efficient and effective as the cloud becomes home to more and more data.