Microsoft Exchange 2000 Server Adminstrator's Companion
A server cluster is a group of independent servers running Cluster service and working together as a single unit. Clusters provide high availability, scalability, and manageability for resources and applications by grouping multiple Microsoft Windows 2000 Advanced servers or Microsoft Windows 2000 Datacenter servers into a single administrative unit.
NOTE
This chapter provides only an overview of the Cluster service in Windows 2000 Advanced Server, concentrating on how Exchange 2000 Server leverages this service. For a more complete discussion of clustering, please consult the Microsoft Windows 2000 Server Resource Kit (Microsoft Press, 2000).
Clustered servers share a common storage medium, such as an external drive bank of SCSI drives using a hardware-based RAID solution. Most often, information is written in cross-longitudinal fashion across the set of disks, with parity information, which is, essentially, an on-disk algorithmic backup of the data itself. This type of installation is called RAID-5 and is very common in the networking world today.
Clustering provides high availability by allowing applications to run on any single server in the cluster. If the host server goes down, the applications are moved to another server in the cluster, where they can continue to run. This movement is, of course, transparent to the users. The users know only that they have connected to the server for e-mail. They do not know which server in the cluster they have connected to. The Cluster service directs their input and output to the correct server in the cluster.
Problems Solved by Clustering
Clustering avoids the problems that can arise due to a hardware failure, such as a blown CPU, bad memory, or the loss of an entire computer. It also allows services to remain alive for users when there is a planned outage, such as routine maintenance, a software or firmware upgrade, or a configuration change.
Clustering also monitors the health of the installed software applications. If it detects problems with a particular application, it may take actions such as attempting to restart the software or even moving the software focus to a different server in the cluster.
Problems Not Solved by Clustering
Clustering does not solve problems related to poor management, such as data not being backed up or poor-quality hardware used to implement RAID-5. Neither does it help in the event of a major disaster, such as when servers are physically destroyed. Because clustering in Windows 2000 depends on the Active Directory directory service and DNS, clustering also does not help if these technologies have been poorly implemented or are not available.
Clustering Terminology
Before we begin our discussion, it's important to understand some of the basic terminology. We'll define some terms in staccato fashion and then move on to other clustering topics.
- Shared-nothing architecture Also known as active/passive architecture, the shared-nothing architecture makes one of the physical nodes responsible for running an application while the other servers, or nodes, wait on the sidelines for the first physical server to fail so that they can leap into action and take over the application. Only one server works at any given time for an application. This architecture is why clustering has been viewed as such an expensive solution to implement, and it is the model used in Microsoft Windows NT Server 4.
- Shared-everything architecture Also known as active/active, the shared-everything architecture gives any physical server in the cluster access to all of the data and application code at any given time and can offer these services to the client as needed. In this scenario, hardware is better utilized; it is the architecture used by the older VAX/VMS servers.
- High availability The aim of high availability is to minimize downtime. Windows Clustering is a highly available solution, but it does not guarantee nonstop operation.
- Fault tolerance The aim of fault tolerance is to eliminate downtime. Windows Clustering is not fault tolerant. Instead, fault tolerance is usually provided at the hard disk and controller level.
- Resource A resource is an entity that provides a service, such as a disk or an IP address.
- Group A group is a combination of resources that are managed as a unit.
- Dependency A dependency is an alliance between two or more resources and is very common in Windows NT Server 4 and Windows 2000 Server.
- Failover/failback These terms refer to the process of moving a resource from one server to another. Failover happens when a problem occurs on the active server and services must be transferred to the passive server.
- Quorum resource The quorum resource stores the cluster management data and is usually held on a shared disk.
- Heartbeat A heartbeat is a group of packets that are sent over a private IP network between nodes to detect the health of the other nodes, as well as the applications and services they manage within the cluster.
Advantages of Using Windows Clustering
Windows Clustering has several attractive benefits, including the following:
- High availability With a server cluster, ownership of resources such as disks or IP addresses is automatically transferred from a failed server to the surviving server. The software is restarted on the surviving server, and users experience only a momentary pause in service.
- Failback Windows Clustering automatically rebalances the workload when a failed server comes back on line.
- Manageability You can use the Cluster Administrator (discussed later in this chapter) to manage a cluster as a single system and to manage applications as if they were running on a single server, even though they are running on separate servers.
- Scalability Server clusters can grow to meet increasing demands. For instance, when the overall capacity of the servers in the cluster can no longer meet the demand placed on them by the users, you can add additional servers to the cluster without needing to re-create the cluster itself.
Exchange 2000 Clustering
The clustering solution in Exchange Server 5.5 is a shared-nothing solution that was not met with great enthusiasm in the marketplace because it is costly to implement and because it calls for the existence of one server that does little but wait for a failure on the active server.
Exchange 2000 Server leverages the active/active implementation of clustering, meaning that all of the hardware capacities of each server are exploited. With this implementation, two or more physical machines can be running Exchange services and offering these services on the network. Exchange 2000 Server supports as many nodes as your version of Windows 2000 supports. Windows 2000 Advanced Server supports two nodes. Windows 2000 Datacenter Server supports four nodes. As these numbers increase with future releases of Windows 2000, Exchange 2000 Server will support more nodes accordingly.
NOTE
Even in a clustered environment, not all components in Exchange 2000 Server are active/active. The Message Transfer Agent, public folder hierarchy, and Chat Service are active/passive, meaning that only one server in the cluster will offer these services to the network at any given time.
Exchange 2000 Server also supports rolling upgrades, which means that during an upgrade, the Exchange services are moved to the second node, which continues to offer the services to the network while the Exchange software is upgraded on the first node. When the upgrade is completed on the first node, services are moved back to it and the Exchange software is upgraded on the second node. This activity is all transparent to the user and is one of the ways in which Exchange 2000 Server's clustering capability achieves high availability.
Figure 20-1 shows an Exchange 2000 server cluster installed on two physical servers that share a common storage device. In the Disk Management folder in the Windows 2000 Computer Management snap-in, the external disks appear as ordinary disks on the server that has access to them. In this illustration, disks 1 and 2 are in the external drive bay. They are not mirrored, although in nearly all production environments, you will want to use some type of fault tolerance at the hardware level. Windows 2000 provides the Cluster service so that even though both machines are connected to the shared data, only one server can access that data at any given time.
Figure 20-1. Disk configuration for a server cluster.
Windows 2000 requires that all data be on the shared drive so that when there is a failover, the surviving node can access that data. The binary files (Exchange system files) are still held on the individual servers in the cluster.