The most popular NoSQL (“Not only SQL”) database, MongoDB continues to be the database of choice for application teams dealing with high volumes of unstructured data and next-generation cloud-native and hybrid cloud deployments. Modern NoSQL databases like MongoDB help developers bring powerful customer-facing applications into production faster than ever.

Unlike relational databases, a NoSQL database stores massive amounts of data without requiring a logical category or schema. MongoDB stores its information in documents, which can be printed out in JavaScript Object Notation (JSON) format. This flexible format supports fields that can vary from document to document and enables developers to easily change data structures. Instead of using a standardized query language like SQL, you can interact with the MongoDB server using JavaScript and a simple API. 

One of the advantages of MongoDB is that it includes native replication. While native replication is good for protecting against media and network failures by replicating data across nodes, it can be a serious disadvantage in the case of data corruption or loss, whereby the corrupted data set gets replicated, exacerbating an already bad situation. 

Native database replication isn’t a substitute for backup, nor does it help meet test/dev and data mobility requirements. Organizations deploying NoSQL-based applications require both database backup and replication. 

 

MongoDB Database Backup Requirements

To successfully protect your data that resides in MongoDB, consider the following key requirements.

Requirement #1–Application-consistent backups. Next-generation cloud applications are always-on by nature. As the application scales, the underlying MongoDB also needs to scale out to multiple shards and replica sets. A NoSQL database backup solution should provide a consistent backup copy across shards and replica sets without impacting application performance during backup or having to quiesce the database. 

MongoDB’s native database dump option doesn’t provide application-consistent backups. It doesn’t scale, is problematic for backing up sharded clusters and replica sets, and subject to human error. You need a consistent point-in-time backup of sharded clusters to recover cleanly from data loss.

Requirement #2–Granular backups. MongoDB groups different types of documents, which can be stored in separate places, in a collection. A collection is similar to a table in a relational database. Depending on the application requirements, some collections may need to be backed up every hour versus others that may be backed daily. The flexibility to schedule backups at any interval with collection-level granularity is another requirement of MongoDB database backup. 

Requirement #3–Orchestrated recovery to alternate topologies. The topology of MongoDB clusters differs at each stage of the data lifecycle. For example, in production, the application could be deployed on a sharded MongoDB cluster on-premises, but your test team might have access to only unsharded MongoDB clusters in the Amazon Web Services (AWS) Cloud. Therefore, your database backup solution should enable multiple restore operations such as sharded-to-sharded or sharded-to-unsharded. Likewise, your backup solution should support restore operations of geographically distributed clusters.

Additionally, native MongoDB backup tools have limited capability to restore to test/dev clusters. Delivering data mobility to refresh or push data into test/dev often requires restoring data to a cluster with a different topology from the source cluster. Native MongoDB backup tools must restore data to the original topology (shards/replica sets), making it difficult to create smaller test/dev clusters.

Requirement #4–Node failure handling. Failures are common in the distributed database world. Ergo, the database backup should be resilient to database process failures, node failures, network failures, and even logical data corruption during backup and recovery operations. The backup solution should also be able to handle failures of MongoDB config servers that store metadata for sharded clusters.

 

Rubrik’s Database Backup Solution

Rubrik Mosaic is purpose-built software designed specifically to solve the challenges of backup and recovery for modern NoSQL databases and big data file systems. Chief among its benefits:

  • Application-consistent backups for sharded and unsharded database configurations

  • Faster incremental backups that make it easier to meet Recovery Point Objective (RPO) requirements measured in hours

  • Cluster-consistent versioning, or point-in-time backup copies of MongoDB collections at user-specified intervals

  • Semantic de-duplication to drastically reduce the costs of storing distributed database backups, resulting in savings of up to 70%

  • Fully orchestrated and granular NoSQL database recovery

  • Refreshed production data with any-to-any topology restores

To learn more about Rubrik Mosaic and NoSQL database backup, see Data Protection Built for MongoDB. Take a deep dive by reading our white paper, the Definitive Guide to Backup and Recovery for MongoDB.