Linux Clustering With Csm and Gpfs

2017-07-07 02:10:07

< Day Day Up >

This chapter provides an overview of the General Parallel File System (GPFS) architecture, components, and terminology. It also describes the different kinds of implementations that can be used with the product.

4.1 Introduction to GPFS

IBM General Parallel File System is the IBM's first shared disk file system. It was initially released on the RS/6000® SP in 1998 using a software simulation of a storage area network called the IBM Virtual Shared Disk (VSD). At its core, GPFS is a parallel disk file system. The parallel nature of GPFS guarantees that the entire file system is available to all nodes within a defined scope and the file system's services can be safely applied to the same file system on multiple nodes simultaneously.

The IBM General Parallel File System allows users shared access to files that may span multiple disk drives on multiple nodes. It offers many of the standard UNIX file system interfaces, allowing most applications to execute without modification or recompiling. UNIX file system utilities are also supported by GPFS. That is, users can continue to use the UNIX commands they have always used for ordinary file operations. The only new commands are those for administering the GPFS file system.

GPFS provides file system services to parallel and serial applications. GPFS allows parallel applications simultaneously access to the same files, or different files, from any node in GPFS node group while managing a high level of control over all file system operations (see GPFS nodeset).

GPFS is particularly appropriate in an environment where the aggregate peak need for data exceeds the capability of a distributed file system server. It is not appropriate for those environments where hot backup is the main requirement or where data is readily partitioned along individual node boundaries.

This chapter primarily addresses GPFS Version 1.3.0, which was the most current version when this book was written.

4.1.1 GPFS terms and definitions

There are many terms that are used by GPFS. Some are unique to GPFS and some are more general, but to prevent confusion we cover these terms in this section.

GPFS cluster

A GPFS cluster is a collection of nodes with a shared-disk file system that can provide data access from all nodes in a cluster environment concurrently.

GPFS nodeset

A GPFS nodeset is a group of nodes that all run the same level of GPFS code and operate on the same file systems.

GPFS Open Source Portability Layer

The GPFS Open Source Portability Layer is a set of source files that can be compiled to provide Linux kernel abstraction layer for GPFS and is used to enable communication between the Linux kernel and the GPFS kernel modules.

GPFS Network Shared Disk (NSD)

GPFS Network Shared Disk (NSD) is a GPFS disk subsystem that provides remote disk capability and global disk naming for GPFS shared-disk file system.

Failure group

A failure group is a set of disks that share a common point of failure that could cause them all to become simultaneously unavailable.

Metadata

Metadata consists of i-nodes and indirect blocks that contains file size, time of last modification, and addresses of all disk blocks that comprise the file data. It is used to locate and organize user data contained in GPFS's striped blocks.

Quorum

Quorum is a simple rule to ensure the integrity of a cluster and the resources under its administration. In GPFS, quorum is achieved if GPFS daemons in at least half of all nodes are in the active state and are able to communicate with each other.

4.1.2 What is new in GPFS for Linux Version 1.3

There are several enhancements in Version 1.3:

Support for IBM Cluster 1300 or an IBM Cluster 1350 with:
- Red Hat Linux 7.2 with a minimum kernel level of 2.4.9-34
- Red Hat Linux 7.3 with a minimum kernel level 2.4.18-3
- SuSE Linux Enterprise Server 7 with minimum kernel level of 2.4.18-64GB-SMP

Maximum GPFS nodeset size of 256 IBM xSeries machines.

The capability to read from or write to a file with direct I/O. The Direct I/O caching policy bypasses file cache and transfers data directly from disk into the user space buffer, as opposed to using the normal cache policy of placing pages in kernel memory. Applications with poor cache hit rates or very large I/Os may benefit from the use of Direct I/O.

4.1.3 GPFS advantages

GPFS provides several advantages when building a cluster. We have summarized some of these here:

Improved system performance

GPFS capabilities allow multiple processes or applications on all nodes in the nodeset simultaneous access (concurrent reads and writes from multiple nodes) to the same file using standard file system calls. It uses all the nodes in a nodeset to increase the file system bandwidth by spreading reads and writes across multiple disks. The load balancing across the system avoids one disk being more active than another.

Assured configuration and file consistency

GPFS uses a sophisticated token management system to provide data consistency while allowing multiple, independent paths to the same file, by the same name, from anywhere in the nodeset. Even when nodes are down or hardware resource demands are high, GPFS can find an available path to file system data.

High recoverability and increased data availability

GPFS is a logging file system that creates separate logs for each node. These logs record the allocation and modification of metadata, aiding in fast recovery and the restoration of data consistency in the event of node failure.

GPFS fail-over support allows you to organize your hardware to minimize single points of failure. Plan to organize your disks into a number of failure groups.

In order to assure data availability, GPFS maintains each instance of replicated data on disks in different failure groups. Even if you do not specify replication when creating a file system, GPFS automatically replicates recovery logs in separate failure groups.

Enhanced system flexibility

With GPFS, your system resources are not frozen. You can add or delete disks while the file system is mounted.

When the time is right and system demand is low, you can re-balance the file system across all currently configured disks.

You can also add new nodes without having to stop and restart the GPFS daemon.

After GPFS has been configured for your system, depending on your applications, hardware, and workload, you can reconfigure GPFS to increase throughput.

Simplified administration

GPFS administration commands are designed to keep configuration and file system information synchronized between each nodes, and with GPFS system files on each node in the nodeset, most GPFS administration tasks can be performed from any node running GPFS.