The fourth week of the course covered fault tolerance and availability. The main purpose of the week was to highlight and deal with network traffic to the MongoDB cluster or replica set.
Availability during maintenance
The week started with a few videos on doing maintenance on a replica set. One such task would be updates to the MongoDB server, perhaps an upgrade from 2.4 to 2.6. Depending on the application requirements and permitted downtime, pulling down nodes in a production cluster for maintenance proves more complex than I thought.
The key to minimising the impact of nodes going down is proper planning. It starts with scheduling the update during a maintenance period. Then you make a backup plan, a good practice which usually seems a waste of time until you need to roll back. One the time is right, you roll through your nodes and perform the updates. It’s important to do the primary as last node. Stepping down the primary node should ideally be the most pain in the entire operation, because there will be downtime until a new primary is elected.
After watching the videos I believe the following steps will yield the best result:
- Log in to the node
- If this is a primary, step it down and wait for the election to finish
- issue a
- Do maintenance; there are several paths but the most important thing is that during maintenance the node must not be started with production parameters. This means a different port, but also that it shouldn’t get the
- Restart the node with production parameters
- Wait for it to catch up and move on to the next node
These steps should get you through any maintenance on live systems without worrying too much about the details of availability to the replica set.
Then a frequently asked question was answered: can you use a load balancer in front of MongoDB? From what I understood, you can try but most of the time this will cause a problem. Clients who started a session, for instance not getting all the data in a cursor, will get an error because this kind of state isn’t shared between nodes. I ground my teeth and went on with the next videos.
What followed was a bunch of videos about connection options on various drivers, with a focus on the Java driver. Those videos made me doze off a little, so when I got to the videos about managing connections I was shocked about how quickly connections and memory usage (since every connection reserves roughly 1MB) stacked up.
Connections are coming in from everywhere. There’s clients connecting to the primary node, but in some configurations to secondaries too; there are heartbeats between all nodes in a replica set; there’s a few connections between a node and the node it is synchronizing from; finally a monitoring agent might connect to all nodes as well. Most of this traffic will go to the primary, because of the clients defaulting to it for reads and definitely connecting to it for writes.
In a cluster there is even more traffic: connections from
mongos servers, connections to config servers and connections between shards to balance out the data add up quickly. More connections is more network traffic and more memory needed to manage the connections. This means less memory for data. A new problem has risen. Then the videos were so kind to mention that once heartbeat responses are low, elections might be held and the new primary might get as overwhelmed as the old one did. Scary stuff.
This week taught me the way to get more predictability is by enforcing a maximum number of connections on the
mongos nodes. By assigning a explicit value, the
mongos process will act as a sort of safeguard to the system and prevent the number of connections to a primary from being a factor in availability problems.
During the course a formula was given to calculate the setting for the
mongos process parameter. The presented formula is as follows:
desired max. connections on primary - (number of secondaries * 3) - (number of other processes * 3) / number of expected mongos processes
Where the variable for other processes is defined by the number of processes other than
mongod, config server or mongo client. Basically this means any monitoring agents.
The first thing I didn’t like was the magic number 3. It popped out immediately and I couldn’t think of any other reason than it standing for the number of nodes in a replica set. Then there’s probably a reasonable amount of what, for lack of better words, I call dark connections. They’re network connections in the system but not something covered in the material of the course. I guess that’s why there’s a mentioning of a safeguard percentage on the first variable in the formula.
If you’d follow the recommendations in the homework assignment, you would end up with 90% of the desired max. connections as your first variable. It’s a rough guess to have at least some buffer. In the end you should end up adjusting the parameters based on what your monitoring tools will tell you.
There was a recap of the configuration options for read preferences. Sadly it didn’t tie in with the rest of the videos in a meaningful way for me.
Rollbacks automated when the amount of rollback data is less than 300 Mb. When the amount of rolled back data exceeds that limit the node will not automatically re-join the replica set. This means that somehow you should be able to monitor that: 1) rollback is needed, 2) it occurred automatically or manual intervention is needed.
After a rollback some data is left on the node, which has to be examined manually because it was thrown out of the node that re-entered the replica set.
This video seemed like another loose end to me, but there was a blog plug attached on how to simulate a rollback and the method was still relevant for version 2.6. In the video a different approach was used, because it’s possible to create a
ReplSetTest in the Mongo shell, which has some handy tricks to speed up the setup of various replica set configurations.
Last stretch of the week
There was a video on the existing states a
mongod process could be in. At the moment there are eleven. Some feedback was given, I probably forget it if I won’t read it back several times. Then it was on to the homework.
I noticed the answer video of the first homework assignment (4.1) was already incorrectly linked to a quiz in the course. Others noticed it as well. I did try the assignment after all, but the excitement of finding the answer by myself was gone. This led to me having a fit about professionalism and such but I kept that rant to myself (neighbours probably didn’t hear my thoughts). On to the other assignments it was.
It wasn’t too difficult to complete this week, despite believing there’s a bug in the
MongoProc tool. Somehow I managed to get a node stuck trying to roll back data. Turns out all the parameters for the node resembled what the run replica set expected, but the data didn’t. The new node kept looking for a common point which it was never going to find. I guess this experience is the same as falling off the oplog.
I am glad that it’s possible to watch and learn at my convenience, because if this was a course with set times then it would be much harder to finish at the moment.