Currently there are solutions that handle large volumes of data and users as social networks, banking systems, among others, which need to remain active for any eventuality presented, say, an electricity failure, network equipment failure, which for us as users would be critical, can you imagine that such an eventuality was presented at the bank which you are a customer and all your money just disappear? or What if all your photos of your favorite social network just got erased one day?. If, in such an environment prone to failures all these situations may arise and in fact occur, but it is transparent for us as service providers as previous implement replication solutions and high availability to prevent such situations.
Replication is the process of copying and maintaining objects in multiple databases to have a distributed system, enabling improved performance and protect the availability of applications by providing alternative access to data.
All base management modern data offer ways to provide high availability and replications, allowing them to be useful in cases of failures, but many, if not most, need outsourced tools to provide a robust and efficient mechanism also level programming, can complicate the existence programmers to be a little tedious when configuring and testing, however, the creators and contributors of MongoDB provide us a way quite simple to provide high availability and replication.
In MongoDB, replication it is how to provide high availability and fault tolerance native and transparent to applications that use it as a database manager form, allowing programmers should not nor understand what happens behind process, just be sure that they have the same and that is quite robust and efficient. Replication part of a collection of instances or nodes MongoDB, called replica set, where should always be a primary node to be active.
The minimum number of nodes to form a replica set is three, since in case of a failure in the primary, an election process is activated to search among the remaining nodes in a single substitute to continue providing the service. If there are only two nodes, there would be no majority in the election and not a new primary node running the inactive set would be selected.
MongoDB implements a special collection that keeps recovery logs for all operations that modify data, called "oplog" or log of operations. Modification operations are performed first on the primary node and then the same oplog , secondary, copy and apply these processes executed asynchronously operations.
All members of the ensemble have a copy of the oplog in the collection: "local.oplog.rs" to keep updated their database. This is done through heartbeats or pings all members allowing import records from any of the other nodes in the set. Then, in case of a failure, if a node "A" returns as secondary after a fairly long period and oplog has iterated in the primary new "B", will proceed to copy all oplog data of "B" in "A"
Apart from the Oplog, MongoDB implements two types of synchronization to keep all nodes in a replica set, the first is the initial synchronization to load new members with all data set, and the second is replication which keeps the updated set after the initial data synchronization.
In the default configuration scripts they are always directed to the primary, but these can be configured both in connection with the drivers as well as calls to insert or update the following parameters:
It is important to note that when there is no primary, the writing can not be completed, being able to present cases in which mongo must roll back the data in case it detects any inconsistency between node used to be Primary and happened to be. Another point that you need to know is that if a number of nodes for the recognition of greater deeds to the number of nodes is specified, the script will wait forever.
By default, mongo takes its readings to the primary to provide a strong consistency between written with the read data, however, it allows this behavior is modified in its configuration according to the needs that have:
In conclusion, the high availability system MongoDB is quite convenient and easy to deploy, robust and efficient functionality, allows for a distributed environment without having to worry about super strange configurations to a number of components that have that link to provide a service of this kind, moreover, that as the project is in constant improvements, updated with the needs of programmers, with a very large behind and increasingly used by companies of different headings community, we It gives confidence that took a good choice to choose him as manager database.
I hope to have a better understanding of the process of replication in MongoDB, then I'll be posting on this aggregation framework, a feature that allows query operations as group by (SQL queries like) a MongoDB, if , a NoSQL database.