Currently, there are solutions capable of handling large volumes of data and users, such as social networks and banking systems, which must remain operational during any eventualities, such as power outages or network equipment failures. Imagine if such an incident occurred at your bank, causing all your money to disappear, or if all your photos on your favorite social network were suddenly erased.
In an environment prone to failures, these situations can arise and indeed do occur. However, for service providers, the impact is often transparent because they have implemented replication solutions and high availability systems to prevent such issues.
Replication is the process of copying and maintaining objects in multiple databases to create a distributed system. This enhances performance and ensures the availability of applications by providing alternative access to data. All modern database management systems offer mechanisms for high availability and replication, making them useful in cases of failure. However, many, if not most, require outsourced tools to provide a robust and efficient mechanism. This can complicate matters for programmers, making the setup and testing process somewhat tedious.
Fortunately, the creators and contributors of MongoDB have made it relatively simple to achieve high availability and replication. In MongoDB, replication provides high availability and fault tolerance natively and transparently to the applications that use it as a database manager. This means that programmers do not need to understand what happens behind the scenes; they only need to ensure that the system is robust and efficient.
Replication in MongoDB involves a collection of instances or nodes called a replica set. A minimum of three nodes is required to form a replica set, as this allows for a majority to be established during an election process in the event of a primary node failure. If there are only two nodes, there would be no majority to elect a new primary, preventing the system from continuing operations.
MongoDB implements a special collection called the "oplog" (operation log) that keeps recovery logs for all operations that modify data. Modification operations are first executed on the primary node, and then the secondary nodes asynchronously copy and apply these operations from the oplog. All members of the replica set have a copy of the oplog in the collection local.oplog.rs to keep their databases updated. Heartbeats or pings are used to allow nodes to import records from each other.
In the case of a failure, if a node "A" returns as secondary after a significant period, and the oplog has progressed in the new primary "B", node "A" will copy all oplog data from "B" to stay synchronized. MongoDB also implements two types of synchronization:
By default, MongoDB scripts are directed to the primary node, but configurations can be adjusted through parameters:
It is crucial to note that if there is no primary node, writing cannot be completed. There may be situations where MongoDB must roll back data if inconsistencies are detected between the previously active primary and the new primary.
By default, MongoDB reads data from the primary to ensure strong consistency. However, this behavior can be modified according to the application's needs:
When using MongoDB applications, several aspects should be considered:
To create a replica set from the MongoDB console, follow these steps:
Identify the members of the group by running the command on each node:
mongod --replSet "rs0";
Initiate the replica set from one of the member consoles:
rs.initiate();
Check the status of the replica set:
rs.conf();
You should see a result similar to:
{ "_id" : "rs0", "version" : 1, "members" : [ { "_id" : 1, "host" : "mongodb0.rootstack.com:27017" } ] }
Add remaining instances to the replica set:
rs.add("mongodb1.rootstack.com"); rs.add("mongodb2.rootstack.com"); rs.add("mongodbN.rootstack.com");
Verify that the replica set is fully functional by checking the status:
rs.status();
The high availability system in MongoDB is convenient, easy to deploy, robust, and efficient. It allows for a distributed environment without the need for complex configurations across numerous components. As the project continually improves and evolves with the needs of programmers, MongoDB has gained confidence as a trusted database management solution for companies across various sectors. In future posts, I will discuss the aggregation framework, a feature that enables SQL-like query operations in MongoDB, a NoSQL database.