Linux Clustering With Csm and Gpfs
| < Day Day Up > |
|
7.2 GPFS planning
The planning of a GPFS implementation is usually driven by four factors:
-
The desired file system throughput and performance
-
The desired service availability
-
The availability and costs of storage and network hardware
-
The GPFS file system size and replica
The best performance and file system availability can be achieved by using direct attached disks, while you can also have high availability of your disks using Network Shared Disks servers (NSD servers), with lower performance.
Direct attached configurations are feasible when the number of nodes in the nodeset is relatively small. When the number of nodes in the nodeset is greater than the number of nodes that can be simultaneously directly attached to the disks, you must use the Network Shared Disks (NSD), with disks defined in a primary and secondary servers.
You also have to define the size of your GPFS file system. The size of a GPFS file system depends on many factors, like block size and number of replicas. In an environment without any kind of replicas, the file system size can be about 95 to 99 percent of the capacity of the disks. If you activate full replication (data and metadata), you will end up with less then 50 percent of the capacity of the disks. See 4.2.5, "Data and metadata replication capability" on page 89 for more information.
7.2.1 Network implementation
The network utilization for the cluster is directly tied to the model of GPFS nodesets that will be used. In a NSD network attached model, all client nodes (all nodes that are not a primary server for some disk) will be using the network to gain access to the GPFS file systems. In a NSD direct attached model, the network will be used for metadata purposes only.
In NSD network attached model, the primary network should be at least Gigabit Ethernet running in full-duplex mode. The access to disk data through the network can easily saturate if you use a Fast Ethernet network. For better performance, Myrinet Technology should be considered.
You should also consider using a dedicated network segment for the nodes to communicate to each other instead of using only one network bus for everything. This is not a requirement, but can help to improve performance.
GPFS does not support IP aliases for the managed network adapters. You can use them, but do not use the aliases as references for the GPFS to communicate within the cluster.
7.2.2 Documentation
The use of worksheets and drawings for documenting and planning the GPFS cluster installation and management is one of the keys for success. These worksheets can and will help you manage the disks, file systems, and nodes; hence, they must have enough information about all involved systems: Linux installation, network configuration, disk definition, file systems information, and information regarding the cluster itself.
Table 7-1 and Table 7-2 on page 189 are some examples of completed worksheets for a GPFS cluster.
File system | Mount point | NSDs | Replicas |
---|---|---|---|
/dev/gpfs0 | /gpfs0 | gpfs1nsd; gpfs2nsd; gpfs3nsd | Data only |
/dev/gpfs1 | /gpfs1 | gpfs4nsd; gpfs5nsd; gpfs6nsd | Metadata and data |
/dev/gpfs2 | /gpfs2 | gpfs7nsd; gpfs8nsd; | Metadata only |
Disk PVID | NSD | Holds D/M | Primary server | Secondary server |
---|---|---|---|---|
C0A800E93B E1DAF6 | gpfs1nsd | D/M | node1 | node2 |
C0A800E93B E1DAF7 | gpfs2nsd | D/M | node2 | node1 |
C0A800E93B E1DAF8 | gpfs3nsd | D/M | node3 | node1 |
C0A800E93B FA7D86 | gpfs4nsd | D | node1 | node2 |
C0A800E93B FA7D87 | gpfs5nsd | D | node2 | node1 |
C0A800E93B FA7D88 | gpfs6nsd | D/M | node3 | node1 |
| < Day Day Up > |
|