Linux Clustering With Csm and Gpfs

2017-07-07 02:10:07

< Day Day Up >

1.6 Cluster logical structure

As mentioned, clusters are typically made up of a large number of computers (often called nodes). Those nodes are configured in a way that they can perform different logical functions in a cluster. Figure 1-2 on page 10 presents the logical functions that a physical node in a cluster can provide. Remember, these are logical functions; in some cases, multiple logical functions may reside on the same physical node, and in other cases, a logical function may be spread across multiple physical nodes.

Figure 1-2: Logical functions of a physical node

A typical logical structure of a cluster provides the following functions:

Compute function

The real computing power of a cluster is accomplished by the nodes performing the computing function. Each node in charge of the compute function is given one or more tasks to execute as part of the overall solution. These nodes are logically grouped depending on the needs of the job to be performed. The majority of the nodes in a cluster typically perform the compute function.

Control function

Nodes performing the control function provide services that help the other nodes in the cluster work together to obtain the desired result. For example:
- Dynamic Host Configuration Protocol, Domain Name System, and other similar functions for the cluster. These functions enable the nodes to easily be added to the cluster and to ensure that they can communicate with the other nodes.
- Scheduling what tasks are to be done by what compute nodes. For example, if a compute node finishes one task and is available to do additional work, the control node may assign that node the next task requiring work.

Storage function

For most applications that run in a cluster, nodes must have fast, reliable, and simultaneous access to the storage system. This can be accomplished in a variety of ways depending on the specific requirements of the application. In some cases, storage may be direct-attached to nodes. In other cases, the storage requirements may be filled through one or more network shared devices. In this case, nodes in charge of the storage function enable and control access to the storage subsystem.

Install function

In most clusters, the compute nodes (and other nodes) may need to be reconfigured and/or reinstalled with a new image relatively often. Nodes in charge of the install function are responsible for the installation of all other nodes in the cluster. They provide the operating system, libraries, and all the software images and the mechanism for easily and quickly installing or reinstalling software on the cluster nodes.

Management function

Clusters are complex environments and the management of the individual components is very important. Nodes in charge of the management function provide the capability to monitor the status of individual nodes, handle events or alarms originating from the various nodes in the cluster, issue management commands to individual nodes to correct problems, or to provide simple (yet useful) commands to perform such functions as power on/off.

If the management function is to be performed by one node only, it is highly recommended to put in place a high availability plan, such as redundancy on hardware, intense backup policy, and so on, because, unlike failures in nodes performing the compute function, failure on the only node performing the management function may affect the entire availability of the cluster.

User function

The user function node is the gateway for the outside world to gain access to the cluster. The individual nodes of a cluster are often on a private network that cannot be accessed directly from the outside or corporate network. The node in charge of the user function is configured in a way that users (possibly on outside networks) can securely gain access to the cluster to request that a job be run, or to access the results of a previously run job.

1.6.1 Cluster node types and xSeries offerings

Now that we understand the types of logical functions that a typical clustered environment is composed of, let's begin to look at the functional nodes that clusters themselves are based upon. For a more detailed examination of the hardware that is associated with each node type, refer to Chapter 2, "New Linux cluster offering from IBM: Cluster 1350" on page 21.

Nodes within a cluster are generally categorized by functionality, and are listed here:

Management nodes

Compute nodes

Storage nodes

Management nodes

The term management node is a generic term. It is also known as the head node or master node. The management node aids in controlling the cluster, but can also be used in additional ways. It often contains a dedicated network interface connected to a private virtual local area network (VLAN), defined as a management VLAN, and accessible only to the cluster administrator.

Management nodes generally provide one or more of the following logical node functions described in the last section:

User node

Control node

Management node

Installation node

Storage node

In a small cluster, say eight compute nodes, all of these functions can be combined in one head node. In larger clusters, the functions are best split across multiple machines for security and performance reasons.

From the IBM Linux cluster perspective, the management node that is currently offered for use on the IBM Cluster 1350 is the x345.

The following summary of the specifications for the x345 machines is provided here. Also check 2.2.2, "Cluster nodes" on page 25 for more details on the x345.

The Model x345 is a 2U rack-optimized server with one or two 2.4, 2.6, 2.8, and 3.0 GHz Intel Xeon DP processors and allows for up to 8 GB of (PC2100 ECC DDR) SDRAM. The x345 comes with 6 internal, hot-swapable drive bays, these bays will house up to 880.8 GB of storage (Hard Drives). The x345 also includes five PCI based expansion slots, the current PCI configuration is described in the following list:
- One 32-bit, 33MHz PCI slots
- Two 64-bit low profile, 100 MHz PCI-X slots
- Two 64-bit, 133 MHz PCI-X slots

The x345 provides support for a wide range of communication devices. Some of these devices are shown in the following list:
- Gigabit Fiber Ethernet SX
- 10/100/1000 Mbps Copper Ethernet
- 100/16/4 Mbps High-Speed Token Ring
- Server RAID 4Lx - 4Mx (Ultra160 SCSI Controller)
- Myrinet 133 MHz Fibre Channel Host Bus Adapter (HBA)
- Advanced Systems Management Adapter (ASMA)
- and Remote Supervisor Adapter (RSA) cards

Compute nodes

The compute node is the computational heart of the cluster. Most communications with these nodes are done using a parallel shell command (dsh). The compute node only performs the compute functions in a cluster. Computer nodes are high performance specialized machines, specially designed for large clusters.

The choice of hardware and configuration parameters of the compute node should be based on the applications that will be run on the system. The following characteristics should be considered:

Processor type

Size and speed of level 2 (L2) cache

Number of processors per node

Front-side bus speed

Memory subsystem scalability

PCI bus speed

As we have said, applications drive the decision of how to build the compute nodes. If the compute node is required to access the cache frequently, for example, the size of the L2 cache and memory subsystem should be considered, a large cache may enhance performance. On the other hand, applications that use great amounts of memory benefit from a faster CPU, system bus, and memory subsystem.

From the IBM Linux cluster perspective, compute nodes are typically either the x335 or the BladeCenter™ HS20 machines. Optionally, the x345 machines could also be used as compute nodes if the x335 does not meet the customer's configuration needs.

The summary of the specifications for both the x335 and HS20 machines are provided as follows. Also check 2.2.2, "Cluster nodes" on page 25 for details on both the x335 and the HS20 models:

The Model x335 is a 1U rack-optimized server with one or two 2.4, 2.6, 2.8, and 3.0 GHz Intel Xeon DP processors and allows for up to 8 GB of (PC2100 ECC DDR) SDRAM. The x335 comes with 2 internal, hot-swapable drive bays, these bays will house up to 293.6 GB of storage (Hard Drives). The x335 also includes 2 PCI based expansion slots, the current PCI configuration includes: Two 64-bit, 100 MHz PCI-X slots.

The x335 provides support for a wide range of communication devices. Some of these devices are shown in the following list:
- Gigabit Fiber Ethernet SX
- 10/100/1000 Mbps Copper Ethernet
- 100/16/4 Mbps High-Speed Token Ring
- Server RAID 4H, 4Lx, and 4Mx (Ultra160 SCSI Controller)
- Myrinet 133 MHz Fibre Channel Host Bus Adapter (HBA)
- Advanced Systems Management Adapter (ASMA)
- RSA cards

The BladeCenter Chassis is a 7U unit, that will house up to 14 HS20 (blade servers). The BladeCenter Chassis also provides fault tolerant connections to each of the blade servers. The communications architecture allows for up to four hot-swapable switch modules to be added to the back of the BladeCenter Chassis, in addition to the redundant power supplies and Management Module. These switch modules add communications support for the Gb Ethernet chipsets that are built onboard the HS20's. The fibre channel switch modules are not supported for use in a Cluster 1350 environment at this time. However, an optional Fibre Channel Daughter Board is available for the HS20's, and will allow for these blades to communicate over fibre channel in a non-clustered environment.

Note

The addition / presence of a Daughter Board in an HS20, will functionally disable one of the two internal IDE drives. This is an important issue to address during design phase, as it will remove the ability for the last remaining IDE drive to be functionally protected by RAID level 1.

The HS20 Blade Server will allow for 80 GB of internal storage per blade when both of the IDE slots are fully populated. The total amount of IDE based storage for a fully populated (14 x HS20) BladeCenter Chassis is approximately 1.12 Terabytes (TB). The following note will give additional details on an alternate SCSI based configuration that is possible to use with the HS20 Blade Servers. Other more viable and extensible storage options will be discussed later in this redbook.

Note

The following SCSI configuration is not supported in the Cluster 1350 at this time.

The HS20 is capable of utilizing an external SCSI storage blade, that will fit inside of the BladeCenter Chassis. In this alternate configuration it is possible to place only (7) HS20 Blade Servers and their associated (7) SCSI storage blades into a single BladeCenter Chassis. The benefits that could be gained from this alternate configuration, would be an increased level of performance from the local SCSI drives, and an increased amount of storage for the sum of all of the Blade's within a single chassis. The approximate numbers show that without using any type of RAID protection, that a single chassis could hold up to 2.6 TB, and with a RAID 1 protected configuration you would be able hold up to 1.3 TB of storage in a single chassis.

This alternate configuration and will reduce the amount of processing power that each BladeCenter Chassis will be capable of producing, even further this will disallow you from adding on any type of daughter board in the future (for example, you will not be able to add any HBA's)

The HS20 blade server is approximately 6U's tall, and 1U wide, they are designed to be vertically mounted within a BladeCenter Chassis. Each HS20 is capable of using one or two 2.4, 2.6, and 2.8 GHz Intel Xeon DP processors. The HS20 Model can allow for up to 4 GB of (PC2100 ECC DDR) SDRAM. The HS20 comes with 2 internal Gb Ethernet channels, the first Ethernet controller is used to allow each blade to communicate with the Ethernet switch module that can be installed on the back of the chassis. This first connection is mandatory, if you would like for your HS20 to communicate at all. The second Ethernet controller can be configured to provide a higher level of availability or add a failover capability to the communications subsystem for each of the HS20's within a chassis. The second Ethernet controller can also be configured to provide a direct connection to the management node (effectively isolating normal traffic, from the traffic used for command and control), this enhances inter-process communication.

The HS20 provides one internal (custom) expansion slot: This expansion slot currently supports a Fibre Channel Daughter Board. Other communication devices that will be released in the near future, will add Myrinet style fibre connectivity and serial based communications to the HS20.

Storage nodes

Often, when discussing cluster structures, a storage node is defined as a third type of node. However, in practice, a storage node is often just a specialized version of either a management node or a compute node. The reason that storage nodes are sometimes designated as a unique node type is that the hardware and software requirements to support storage devices might vary from other management or compute nodes. Depending on your storage requirements and the type of storage access you require, this may include special adapters and drivers to support RAID5, storage devices attached via channel storage, and others.

From the IBM Linux cluster perspective, the storage node(s) are typically either the x345 or the x360 machines. In addition to these base options customers can decide to use the x335 machine as a third option, if it is determined that the x335 is a better fit for their specific solution.

The x335, x345, and x360 systems when configured as storage nodes may be connected to a FAStT200 storage controller and up to (5) EXP500 disk expansion units providing over 8 TB of storage in a basic configuration. If it is determined that more storage is needed, customers have an option to upgrade to the FAStT700 storage controller and up to (16) EXP700 disk expansion units providing over 32 TB of disk storage in its basic configuration.

From the IBM Linux cluster perspective, the storage node that is currently preferred for use in the IBM Cluster 1350 is the x345.

The summary of the specifications for the x360 machines is provided here. Also check 2.2.2, "Cluster nodes" on page 25 for more details on the x360.

The Model x360 is a 3U rack-optimized server with one, two or four 1.5, 1.9 2.0, 2.5, and 2.8 GHz Intel Xeon MP processors and allows for up to 16 GB of (PC1600 DDR) SDRAM. The x360 comes with 3 internal, hot-swappable drive bays, these bays will house up to 220.2 GB of storage (Hard Drives). The x360 also includes six PCI based expansion slots, the current PCI configuration is described in the following list:
- Two 64-bit, 100–133 MHz PCI-X slots
- Four 64-bit, 66 MHz PCI-X slots

The x360 provides support for a wide range of communication devices. Some of these devices are shown in the following list:
- Gigabit Fiber Ethernet SX
- 10/100/1000 Mbps Copper Ethernet
- 100/16/4 Mbps High-Speed Token Ring
- Server RAID 4H, 4Lx, and 4Mx (Ultra160 SCSI Controller)
- Myrinet 133 MHz Fibre Channel Host Bus Adapter (HBA)
- Advanced Systems Management Adapter (ASMA)
- RSA cards