Linux Clustering With Csm and Gpfs

 < Day Day Up > 


This chapter provides an overview of the General Parallel File System (GPFS) architecture, components, and terminology. It also describes the different kinds of implementations that can be used with the product.

4.1 Introduction to GPFS

IBM General Parallel File System is the IBM's first shared disk file system. It was initially released on the RS/6000® SP in 1998 using a software simulation of a storage area network called the IBM Virtual Shared Disk (VSD). At its core, GPFS is a parallel disk file system. The parallel nature of GPFS guarantees that the entire file system is available to all nodes within a defined scope and the file system's services can be safely applied to the same file system on multiple nodes simultaneously.

The IBM General Parallel File System allows users shared access to files that may span multiple disk drives on multiple nodes. It offers many of the standard UNIX file system interfaces, allowing most applications to execute without modification or recompiling. UNIX file system utilities are also supported by GPFS. That is, users can continue to use the UNIX commands they have always used for ordinary file operations. The only new commands are those for administering the GPFS file system.

GPFS provides file system services to parallel and serial applications. GPFS allows parallel applications simultaneously access to the same files, or different files, from any node in GPFS node group while managing a high level of control over all file system operations (see GPFS nodeset).

GPFS is particularly appropriate in an environment where the aggregate peak need for data exceeds the capability of a distributed file system server. It is not appropriate for those environments where hot backup is the main requirement or where data is readily partitioned along individual node boundaries.

This chapter primarily addresses GPFS Version 1.3.0, which was the most current version when this book was written.

4.1.1 GPFS terms and definitions

There are many terms that are used by GPFS. Some are unique to GPFS and some are more general, but to prevent confusion we cover these terms in this section.

GPFS cluster

A GPFS cluster is a collection of nodes with a shared-disk file system that can provide data access from all nodes in a cluster environment concurrently.

GPFS nodeset

A GPFS nodeset is a group of nodes that all run the same level of GPFS code and operate on the same file systems.

GPFS Open Source Portability Layer

The GPFS Open Source Portability Layer is a set of source files that can be compiled to provide Linux kernel abstraction layer for GPFS and is used to enable communication between the Linux kernel and the GPFS kernel modules.

GPFS Network Shared Disk (NSD)

GPFS Network Shared Disk (NSD) is a GPFS disk subsystem that provides remote disk capability and global disk naming for GPFS shared-disk file system.

Failure group

A failure group is a set of disks that share a common point of failure that could cause them all to become simultaneously unavailable.

Metadata

Metadata consists of i-nodes and indirect blocks that contains file size, time of last modification, and addresses of all disk blocks that comprise the file data. It is used to locate and organize user data contained in GPFS's striped blocks.

Quorum

Quorum is a simple rule to ensure the integrity of a cluster and the resources under its administration. In GPFS, quorum is achieved if GPFS daemons in at least half of all nodes are in the active state and are able to communicate with each other.

4.1.2 What is new in GPFS for Linux Version 1.3

There are several enhancements in Version 1.3:

4.1.3 GPFS advantages

GPFS provides several advantages when building a cluster. We have summarized some of these here:


 < Day Day Up > 

Категории