Upgrading MySQL Cluster

MySQL Cluster supports online upgrades within major releases. This means that you can upgrade from 5.0.10 to 5.0.11 without any downtime at all; however, you cannot upgrade from 4.1.x to 5.0.x without downtime.

Upgrading MySQL Cluster is very simple. The first thing you have to do is stop your current management node, upgrade it, and restart it. You then stop one storage node at a time, upgrade each, and start it up with the ndbd -initial command, waiting for it to completely start before moving on to the next storage node.

For example, if you were to try to upgrade the sample cluster we have used so far (with one management node, two storage nodes, and a SQL node on three physical servers), the process would be as follows:

  1. Stop the management node by issuing STOP in the management client, where is the ID of the management node (ID 1 in the sample output from the SHOW command within the management client example earlier in this chapter).
  2. Exit the management client.
  3. Upgrade the MySQL-ndb-management package or copy the new ndb_mgmd and ndb_mgm binaries to overwrite the old ones in your binary directory.
  4. Start the new ndb_mgmd binary from /var/lib/mysql-cluster.
  5. Enter the management console, where you should see something like this:

    ndb_mgm> SHOW Cluster Configuration --------------------- [ndbd(NDB)] 2 node(s) id=2 @10.0.0.2 (Version: 5.0.10, Nodegroup: 0, Master) id=3 @10.0.0.3 (Version: 5.0.10, Nodegroup: 0) [ndb_mgmd(MGM)] 1 node(s) id=1 @10.0.0.1 (Version: 5.0.11) [mysqld(API)] 3 node(s) id=4 @10.0.0.1 (Version: 5.0.10) id=5 @10.0.0.2 (Version: 5.0.10) id=6 @10.0.0.3 (Version: 5.0.10)

    Notice how the MGM node is now version 5.0.11, while all other nodes remain 5.0.10.

  6. Start upgrading storage nodes. Repeat the following process for each storage node in turn:

    • Stop the first storage node by issuing STOP in the management console.
    • Upgrade the MySQL-ndb-storage, MySQL-ndb-tools, and MySQL-ndb-extra RPM packages or overwrite all the old ndb* binaries with the new ones.
    • Start ndbd again, by using the ndbd -initial command.
    • Return to the management console and wait for the status of the node to change from this:

      id=x @10.0.0.x (Version: 5.0.11, starting, Nodegroup: 0) to this:

      id=x @10.0.0.x (Version: 5.0.11, Nodegroup: 0)

      Move on to the next storage node in the cluster.

When you have completed this process for all storage nodes, you have successfully upgraded your cluster. There is no need to do anything to the SQL nodes that are connected to your cluster, although normally you would want to upgrade them. (The upgrade process is not any different as a result of the cluster subject to the one gotcha covered earlier in this chapter: If you completely remove all the RPMs before reinstalling them, you should comment out the cluster lines in my.cnf, or the installation of the new RPMs will fail.)

Upgrading Through Major Versions

The procedure for upgrading across major versionsfor example, from 4.1 to 5.0 or 5.0 to 5.1is simple but requires a short period of downtime:

  1. Enter single-user mode to prevent any changes from being made to your database while you are restoring your backup (see Chapter3, "Backup and Recovery").
  2. Make a backup of your cluster (see Chapter3).
  3. Do a full SQL dump of all databases. (There are bugs in some older versions of MySQL that prevent later versions from reading backup files produced by these versions. You don't want to find that this affects you after you have shut down your cluster and upgraded all the binaries.)
  4. Back up config.ini (in case you have to change it, for whatever reason, and forget what the original is during the upgrade process).
  5. Shut down your cluster (that is, all storage and management nodes).
  6. Copy DataDir on all storage nodes to a backup folder (this aids quick rollback if you need to go back to the previous version quickly), such as to /var/lib/mysql-cluster2.
  7. Copy all binaries that start with ndb in your MySQL bin directory to a backup folder.
  8. Upgrade all the MySQL packages on all nodes to the latest versions.
  9. Start the management daemon.
  10. Start the storage nodes by using --initial.
  11. Attempt to restore the cluster backup (see Chapter3).
  12. If step 11 fails, drop all tables and databases and restore from your SQL dump.
  13. Exit single-user mode.

We suggest that whenever possible, you set up a test cluster on the new version to make sure your data can work nicely. There is a possibility that things might change between major versions, and this might mean, for example, that you would need to increase some parameters in config.ini to get things to work. You probably don't want to find this out for the first time when you have to revert to the older version after attempting to upgrade or have to start searching mailing lists for the solution to your problem with your database down.

Other Methods of Starting a Cluster

You might find that it is extremely inconvenient to have to log in to each machine to start ndbd, particularly if you have a very large number of nodes in your cluster and/or exotic authentication methods on the servers that make up your cluster which make logging in to lots of servers a time-consuming and tedious process. Several tricks are worth mentioning; the first is to use SSH to start all your data nodes (using the ssh -t command to issue a command and then log straight out). You issue the following commands on the management server to completely start your sample cluster (which you should make sure is shut down first):

[root@s1 mysql-cluster]# ndb_mgmd [root@s1 mysql-cluster]# ssh -t 10.0.0.2 ndbd root@10.0.0.2's password: Connection to 10.0.0.2 closed. [root@s1 mysql-cluster]# ssh -t 10.0.3 ndbd root@10.0.0.3's password: Connection to 10.0.0.3 closed. [root@s1 mysql-cluster]# ndb_mgm -- NDB Cluster -- Management Client -- ndb_mgm> SHOW Connected to Management Server at: 10.0.0.1:1186 Cluster Configuration --------------------- [ndbd(NDB)] 2 node(s) id=2 @10.0.0.2 (Version: 5.0.11, starting, Nodegroup: 0, Master) id=3 @10.0.0.3 (Version: 5.0.11, starting, Nodegroup: 0, Master) [ndb_mgmd(MGM)] 1 node(s) id=1 @10.0.0.1 (Version: 5.0.11) [mysqld(API)] 3 node(s) id=4 (not connected, accepting connect from any host) id=5 (not connected, accepting connect from any host) id=6 (not connected, accepting connect from any host)

Then, after a few minutes, your cluster should start as usual:

ndb_mgm> SHOW Connected to Management Server at: 10.0.0.1:1186 Cluster Configuration --------------------- [ndbd(NDB)] 2 node(s) id=2 @10.0.0.2 (Version: 5.0.11, Nodegroup: 0) id=3 @10.0.0.3 (Version: 5.0.11, Nodegroup: 0, Master) [ndb_mgmd(MGM)] 1 node(s) id=1 @10.0.0.1 (Version: 5.0.11) [mysqld(API)] 3 node(s) id=4 (Version: 5.0.11) id=5 (Version: 5.0.11) id=6 (Version: 5.0.11)

If you set up SSH authentication using private/public keys, you can complete this process without using passwords, which means you can script it very easily.

Another trick you can use is to start the ndbd daemon on a node but not actually get it to join the cluster; in this case, it just connects and waits for the management server to instruct it to start. You do this by passing the -n command as an argument to ndbd when you start it (that is, -n = nostart). If you were to start both storage nodes in the sample cluster by using -n, you would get this:

ndb_mgm> SHOW Cluster Configuration --------------------- [ndbd(NDB)] 2 node(s) id=2 @10.0.0.2 (Version: 5.0.11, not started) id=3 @10.0.0.3 (Version: 5.0.11, not started) [ndb_mgmd(MGM)] 1 node(s) id=1 @10.0.0.1 (Version: 5.0.11) [mysqld(API)] 3 node(s) id=4 (not connected, accepting connect from any host) id=5 (not connected, accepting connect from any host) id=6 (not connected, accepting connect from any host)

You can start these storage nodes by issuing the <id> START command in the management client:

ndb_mgm> 2 START Database node 2 is being started. ndb_mgm> 3 START Database node 3 is being started.

You can extend this trick further by writing a simple shell script to detect whether ndbd is not running on a node and then get it to "half start" like this so that you can always restart nodes that die from within the management client. This can make dealing with a large cluster much easier because it means you do not have to log in to each server to start the cluster, and as soon as nodes crash or reboot, they will automatically half start, allowing you to completely start them from the management clienteliminating the need for you to actually log in to your storage nodes. A very simple script such as this should do the trick:

#!/bin/bash # # ndbd_keepalive.sh # # Checks that ndbd, the storage daemon for MySQL Cluster, # is running. If it is not, start it with -n (nostart) # to allow administrator to start it from within the # management client # # Usage: /path/to/ndbd_keepalive.sh # # This script can be run from crontab every few minutes # # (C) Alex Davies 2005. # You are free to do whatever you like with this script ps -efl | grep ndbd | grep -v grep &> /dev/null if [ "$?" != 0 ] then # NDBD is dead # So, restart it # And, log it ndbd wall "ndbd_keepalive.sh restarted ndbd" echo -n "ndbd_keepalive.sh restarted ndbd:" >> /root/ndbd-restart-log date >> /root/ndbd-restart-log fi

Note that you should make sure that the ndbd line contains the full path to the ndbd binary (that is, /usr/local/mysql/bin/ndbd or /usr/sbin/ndbd) if the script does not work without it (which it might not do, particularly if you run it from cron).

You should test the script by starting up your cluster. After the cluster is started, you add the crontab line (crontab -e on most Linux distributions) to the storage nodes, like this:

*/2 * * * * /root/ndbd_keepalive.sh

This runs the script every 2 minutes.

When the cluster is up and running, you should stop a storage node, and within 3 minutes, it should reconnect so you can start it properly from within the management client:

ndb_mgm> SHOW Connected to Management Server at: 10.0.0.1:1186 Cluster Configuration --------------------- [ndbd(NDB)] 2 node(s) id=2 @10.0.0.2 (Version: 5.0.11, Nodegroup: 0) id=3 @10.0.0.3 (Version: 5.0.11, Nodegroup: 0, Master) [ndb_mgmd(MGM)] 1 node(s) id=1 @10.0.0.1 (Version: 5.0.11) [mysqld(API)] 3 node(s) id=4 @10.0.0.1 (Version: 5.0.11) id=5 @10.0.0.2 (Version: 5.0.11) id=6 @10.0.0.3 (Version: 5.0.11) ndb_mgm> 2 STOP Node 2 has shutdown.

At this stage, you should wait 3 minutes for the script to kick in on Storage Node 2 and then restart it. Then, you issue a SHOW command, like this:

ndb_mgm> SHOW Connected to Management Server at: 10.0.0.1:1186 Cluster Configuration --------------------- [ndbd(NDB)] 2 node(s) id=2 @10.0.0.2 (Version: 5.0.11, not started) id=3 @10.0.0.3 (Version: 5.0.11, Nodegroup: 0, Master) [ndb_mgmd(MGM)] 1 node(s) id=1 @10.0.0.1 (Version: 5.0.11) [mysqld(API)] 3 node(s) id=4 @10.0.0.1 (Version: 5.0.11) id=5 @10.0.0.2 (Version: 5.0.11) id=6 @10.0.0.3 (Version: 5.0.11)

Notice how Storage Node 2 has now gone to the status "not started." You can now start it by using 2 START:

ndb_mgm> 2 START Database node 2 is being started.

If the script restarts a node, it will add a log entry with the date and time in /root/ndbd-restart-log. You could also have it email an administrator so that that person knows to come investigate the cause of the cluster crash and complete the startup process.

You might wonder why the preceding script uses ndbd -n rather than just ndbd (which would start the node completely and not require an START command to be entered into the management client by an administrator). The answer is that you should never have a script automatically restart cluster nodes. If cluster nodes die, you should always investigate what the problem is. Of course, it would be perfectly possible to remove the nostart flag from the script and to also write a similar version for ndb_mgmd on the management node; this would allow you to reboot all your machines, and they would then restart the cluster when they started back up. We strongly recommend that you not do this; there are many situations in which you may not actually want both nodes to attempt to start (for example, if you suspect that one node has gotten a corrupt copy of the database, you might want to start the other node(s) in the same node group and then start the dodgy node by using --initial). We believe it is important for an administrator to be around to investigate the cause of node failure.

Note

If you use a script like this, and if you change config.ini, you must manually stop each storage node and restart it by using --initial. If you just stop it and leave it to the script to restart it, the script will simply try to start it every 3 minutes but continue to fail, resulting in downtime for you.

Категории