Linux Clustering With Csm and Gpfs
| < Day Day Up > |
|
6.4 Running commands on the nodes
When working on a cluster, the administrator usually wants to run the same command across a number of nodes, often the entire cluster. CSM provides two ways of doing this; a simple command-line tool, dsh, and a Java GUI, DCEM.
6.4.1 Distributed shell (dsh)
The dsh utility is a lightweight command-line tool that allows parallel commands to be easily issued from the terminal.
Example 6-23 shows the use of the date command with dsh to find out the time on all the machines in the cluster.
Example 6-23: Displaying the time on all nodes in the cluster using dsh
[root@master /]# dsh -a date node4.cluster.com: Wed July 9 18:10:04 CDT 2003 node3.cluster.com: Wed July 9 18:10:04 CDT 2003 storage1.cluster.com: Wed July 9 18:10:04 CDT 2003 node1.cluster.com: Wed July 9 18:10:04 CDT 2003 node2.cluster.com: Wed July 9 18:10:04 CDT 2003 [root@master /]#
Note that the answer does not come back in any particular order. Example 6-24 shows how the sort command can be used to format the output of dsh into node order.
Example 6-24: Formatting dsh output with sort
[root@master /]# dsh -a date | sort node1.cluster.com: Wed July 9 18:10:44 CDT 2003 node2.cluster.com: Wed July 9 18:10:44 CDT 2003 node3.cluster.com: Wed July 9 18:10:44 CDT 2003 node4.cluster.com: Wed July 9 18:10:44 CDT 2003 storage1.cluster.com: Wed July 9 18:10:44 CDT 2003 [root@master /]#
Often, it is more useful to see which nodes return similar or different values. CSM provides dshbak for this purpose.
In Example 6-25, all the nodes produced the same output. Example 6-26 shows what happened when the output differs.
Example 6-25: Using dshbak to format the output of dsh
[root@master /]# dsh -a date | dshbak -c HOSTS ------------------------------------------------------------------------- node1.cluster.com, node2.cluster.com, node3.cluster.com, node4.cluster.com, storage1.cluster.com ------------------------------------------------------------------------------- Wed July 9 18:11:26 CDT 2003 [root@master /]#
Example 6-26: dsh and dshbak with different outputs
[root@master /]# dsh -a ls -l /tmp/file | dshbak -c HOSTS ------------------------------------------------------------------------- node2.cluster.com, node3.cluster.com, node4.cluster.com, storage1.cluster.com ------------------------------------------------------------------------------- -rw-r--r-- 1 root root 0 July 9 18:17 /tmp/file HOSTS ------------------------------------------------------------------------- node1.cluster.com ------------------------------------------------------------------------------- -rw-r--r-- 1 root root 4 July 9 18:17 /tmp/file [root@master /]#
Note that in all the above examples, the piped commands (sort and dshbak) have run on the management node. Many shell meta-characters, including pipe (|), semicolon (;),and redirection (<, > and >>), must be enclosed within quotes if you want the operation to occur on the cluster nodes instead of locally on the management node. For example:
# dsh -av 'rpm -aq | grep glibc'
Tip | If you need to dsh a command that includes special characters but you are unsure how to quote them correctly, create a script file in a shared directory and use dsh to run the script file. Alternatively, DCEM does not suffer from the same special character problems (See 6.4.2, "Distributed command execution manager (DCEM)" on page 169). |
A commonly employed feature of dsh is the -v switch. This will verify (based on lsnode -p) the nodes availability before connecting. This saves waiting for the underlying remote shell (rsh or ssh) to timeout. Example 6-27 shows what happens when dsh -v is used and a node is not responding.
Example 6-27: dsh -v with a down node
[root@master /]# dsh -av date dsh: node4.cluster.com Host is not responding. No command will be issued to this host node1.cluster.com: Wed July 9 18:25:43 CDT 2003 node2.cluster.com: Wed July 9 18:25:43 CDT 2003 node3.cluster.com: Wed July 9 18:25:43 CDT 2003 storage1.cluster.com: Wed July 9 18:25:43 CDT 2003 [root@master /]#
It is possible that performing a large number of operations simultaneously could cause problems, for example, put excessive load on a file server. By default, dsh will attempt to run the specified commands in parallel on up to 64 nodes. This "fan-out" value may be changed by setting the DSH_FANOUT environment variable or using the -f switch to dsh:
# dsh -avf 16 rpm -i /nfs/*.rpm
6.4.2 Distributed command execution manager (DCEM)
In contrast to the lightweight command line tool dsh, DCEM is a Java GUI that performs a similar task. DCEM allows you to construct command specifications for execution on multiple target machines, providing real-time status as commands are executed. You can enter the command definition, run-time options, and selected hosts and groups for a command. You have the option of saving this command specification to use in the future. You can create and modify groups of hosts to use as targets for a command directly from DCEM.
Start DCEM from the command line by running:
# dcem
Figure 6-1 on page 170 shows all xosview windows from our four compute nodes on the GNOME desktop of our management node.
The logs are saved in /root/dcem/logs and the command in /root/dcem/scripts.
Example 6-28: DCEM logs
TIME: July 19 23:42:58.581 INFO: Command Name:xterm Command: xterm -display master:0.0 Successful Machines: node2.cluster.com node1.cluster.com node3.cluster.com node4.cluster.com Failed Machines: TIME: July 19 23:44:01.444 INFO: Command Name:gnome-term Command: gnome-terminal -display master:0.0 Successful Machines: node2.cluster.com node1.cluster.com node3.cluster.com node4.cluster.com Failed Machines: TIME: July 19 23:45:45.385 INFO: Command Name:nxterm Command: nxterm -display master:0.0 Successful Machines: node1.cluster.com node2.cluster.com node3.cluster.com node4.cluster.com Failed Machines:
For a complete description of the Distributed Command Execution Manager functions, refer to the IBM Cluster Systems Management for Linux: Administration Guide, SA22-7873.
| < Day Day Up > |
|