Linux Clustering With Csm and Gpfs

2017-07-07 02:10:07

< Day Day Up >

7.8 Disk definitions

In this section, we show you how to define disks with the NSD server (see 4.2.2, "GPFS Network Shared Disk considerations" on page 81 for information on disk access models on GPFS cluster).

Note

GPFS does not support using different disk access models within the same nodeset.

7.8.1 GPFS nodeset with NSD network attached servers

A nodeset with NSD network attached servers means that all access to the disks and replication will be through one or two storage attached servers (also known as storage node). If your cluster has an internal network segment, this segment will be used for this purpose.

As mentioned in 4.2.2, "GPFS Network Shared Disk considerations" on page 81, NSD network attached disks are connected to one or two storage attached servers only. If a disk is defined with one storage attached server only, and the server fails, the disks would become unavailable to GPFS. If the disk is defined with two NSD network attached servers, then GPFS automatically transfers the I/O requests to the backup server.

Creating Network Shared Disks (NSDs)

You will need to create a descriptor file before creating your NSDs. This file should contain information about each disk that will be a NSD, and should have the following syntax:

DeviceName:PrimaryNSDServer:SecondaryNSDServer:DiskUsage:FailureGroup

Where:

DeviceName	The real device name of the external storage partition (such as /dev/sde1).
PrimaryServer	The host name of the server that the disk is attached to; Remember you must always use the node names defined in the cluster definitions.
SecondaryServer	The server where the secondary disk attachment is connected.
DiskUsage	The kind of information should be stored in this disk. The valid values are data, metadata, and dataAndMetadata (default).
FailureGroup	An integer value (0 to 4000) that identifies the failure group to which this disk belongs. All disks with a common point of failure must belong to the same failure group. The value -1 indicates that the disk has no common point of failure with any other disk in the file system. GPFS uses the failure group information to assure that no two replicas of data or metadata are placed in the same group and thereby become unavailable due to a single failure. When this field is not specified, GPFS assigns a failure group (higher than 4000) automatically to each disk.

Example 7-25 shows a descriptor file named /tmp/descfile, which contains NSD information defined in our cluster.

Example 7-25: /tmp/descfile file

[root@storage001 root]# cat /tmp/descfile /dev/sdd1:storage001-myri0.cluster.com::dataAndMetadata:1 /dev/sde1:storage001-myri0.cluster.com::dataAndMetadata:1 /dev/sdf1:storage001-myri0.cluster.com::dataAndMetadata:1 [root@storage001 root]#

Now we can create the Network Shared Disks by using the mmcrnsd command, as shown in Example 7-26.

Example 7-26: mmcrnsd command

[root@storage001 root]# mmcrnsd -F /tmp/descfile mmcrnsd: Propagating the changes to all affected nodes. This is an asynchronous process. [root@storage001 root]#

After successfully creating the NSD for GPFS cluster, mmcrnsd will comment the original disk device and put the GPFS assigned global name for that disk device at the following line. Example 7-27 shows the modification that was made by the mmcrnsd command.

Example 7-27: /tmp/descfile (modified)

[root@storage001 root]# cat /tmp/descfile # /dev/sdd1:storage001-myri0.cluster.com::dataAndMetadata:1 gpfs1nsd:::dataAndMetadata:1 # /dev/sde1:storage001-myri0.cluster.com::dataAndMetadata:1 gpfs2nsd:::dataAndMetadata:1 # /dev/sdf1:storage001-myri0.cluster.com::dataAndMetadata:1 gpfs3nsd:::dataAndMetadata:1 [root@storage001 root]#

Sometimes you are using a disk to create a new NSD that already contained an NSD that has been terminated. In this situation, mmcrnsd may complain that the disk is already an NSD. Example 7-28 shows an example of the error message.

Example 7-28: Error message in mmcrnsd output

[root@node001 root ] # mmcrnsd -F /tmp/descfile mmcrnsd:Disk descriptor /dev/sde1:node001::dataAndMetadata:1 refers to an existing NSD [root@storage001 root]#

In this case, if you are sure that the disk is not an in-use NSD, you can override the checking by using the -v no option. For example, the default value is -v yes to verify all devices. See Example 7-29 for details.

Example 7-29: -v no option

[root@node001 root]# mmcrnsd -F /tmp/descfile -v no mmcrnsd: Propagating the changes to all affected nodes. This is an asynchronous process. [root@storage001 root]#

You can see the new device names by using the mmlsnsd command. Example 7-30 shows the output of the mmlsnsd command.

Example 7-30: mmlsnsd command

[root@storage001 root]# mmlsnsd File system NSD name Primary node Backup node --------------------------------------------------------------------------- (free disk) gpfs1nsd storage001-myri0.cluster.com (free disk) gpfs2nsd storage001-myri0.cluster.com (free disk) gpfs3nsd storage001-myri0.cluster.com [root@storage001 root]#

You can also use the -m or -M parameter in the mmlsnsd command to see the mapping between the specified global NSD disk names and their local device names. Example 7-31 shows the output of the mmlsnsd -m command.

Example 7-31: mmlsnsd -m output

[root@storage001 root]# mmlsnsd -m NSD name PVID Device Node name Remarks ------------------------------------------------------------------------------------- gpfs1nsd 0A00038D3DB70FE0 /dev/sdd1 storage001-myri0.cluster.com primary node gpfs2nsd 0A00038D3DB70FE1 /dev/sde1 storage001-myri0.cluster.com primary node gpfs3nsd 0A00038D3DB70FE2 /dev/sdf1 storage001-myri0.cluster.com primary node [root@storage001 root]#

Creating the GPFS file system

Once you have your NSDs ready, you can create the GPFS file system. In order to create the file system, you will use the mmcrfs command, where you must define the following attributes in this order:

The mount point.

The name of the device for the file system.

The descriptor file (-F).

The name of the nodeset the file system will reside on (-C) if you defined a nodeset name when creating the cluster.

The mmcrfs command will format the NSDs and get them ready to mount the file system, as well as adding an entry in the /etc/fstab file for the new file system and its mounting point.

Some of the optional parameters are:

-A [yes\|no]	Auto-mount the file system. The default value is yes.
-B	Block size for the file system. Default value is 256 KB, and can be changed to 16 KB, 64 KB, 512 KB, or 1024 KB. If you plan to have a file system with block size of 512 KB or 1024 KB, you must also set the value of the maxblocksize nodeset parameter using the mmchconfig command.
-M	Maximum metadata replicas (maxDataReplicas). The default value is 1 and might be changed to 2.
-r	Default data replicas. The default value is 1 and valid values are 1 and 2; This factor cannot be larger than maxDataReplicas.
-R	Maximum data replicas. The default value is 1 and another valid value is 2.
-m	Default metadata replicas. The default value is 1 and another valid value is 2. This factor cannot be larger than maxMetadataReplicas.
-n	Estimated number of nodes to mount the file system. The default value is 32 and it is used to estimate the size of the data structure for the file system.

Some of the information above must be defined during the creation of the file system and cannot not be changed later. These parameters are:

Block size

maxDataReplicas

maxMetadataReplicas

NumNodes

The rest of the file system parameters can be changed with the mmchfs command.

When creating our GPFS file systems in our lab environment, we used the default settings of block size value, number of nodes to mount the file system on, maxDataReplicas, and maxMetadataReplicas, which is 1, as shown in Example 7-32.

Example 7-32: Create NSD file system

[root@storage001 root]# mmcrfs /gpfs gpfs0 -F /tmp/descfile -A yes The following disks of gpfs0 will be formatted on node storage001.cluster.com: gpfs1nsd: size 71007268 KB gpfs2nsd: size 71007268 KB gpfs3nsd: size 71007268 KB Formatting file system ... Creating Inode File 19 % complete on Wed Oct 23 16:24:14 2002 39 % complete on Wed Oct 23 16:24:19 2002 59 % complete on Wed Oct 23 16:24:24 2002 78 % complete on Wed Oct 23 16:24:29 2002 98 % complete on Wed Oct 23 16:24:34 2002 100 % complete on Wed Oct 23 16:24:35 2002 Creating Allocation Maps Clearing Inode Allocation Map Clearing Block Allocation Map Flushing Allocation Maps Completed creation of file system /dev/gpfs0. mmcrfs: Propagating the changes to all affected nodes. This is an asynchronous process. [root@storage001 root]#

Sometimes you may receive an error message when creating a file system using NSDs that were in use for other file systems, as shown in Example 7-33. If you are sure that the disks are not in use anymore, you can override the verification by issuing the mmcrfs command with the -v no option. The output would be the same as the previous example.

Example 7-33: Error message with mmcrfs command

[root@storage001 root]# mmcrfs /gpfs gpfs0 -F /tmp/descfile -A yes mmcrfs: There is already an existing file system using gpfs0 [root@storage001 root]#

After creating the file system, you can run mmlsfs command to display your file system attributes. Example 7-34 on page 218 shows the output of the mmlsfs command.

Example 7-34: mmlsfs command output

[root@storage001 root]# mmlsfs gpfs0 flag value description ---- -------------- ----------------------------------------------------- -s roundRobin Stripe method -f 8192 Minimum fragment size in bytes -i 512 Inode size in bytes -I 16384 Indirect block size in bytes -m 1 Default number of metadata replicas -M 1 Maximum number of metadata replicas -r 1 Default number of data replicas -R 1 Maximum number of data replicas -D posix Do DENY WRITE/ALL locks block NFS writes(cifs) or not(posix)? -a 1048576 Estimated average file size -n 32 Estimated number of nodes that will mount file system -B 262144 Block size -Q none Quotas enforced none Default quotas enabled -F 104448 Maximum number of inodes -V 6.00 File system version. Highest supported version: 6.00 -d gpfs1nsd Disks in file system -A yes Automatic mount option -C 1 GPFS nodeset identifier -E no Exact mtime default mount option -S no Suppress atime default mount option -o none Additional mount options [root@storage001 root]#

You may also run mmlsdisk to display the current configuration and state of the disks in a file system. Example 7-35 shows the output of the mmlsdisk command

Example 7-35: mmlsdisk command

[root@storage001 root]# mmlsdisk gpfs0 disk driver sector failure holds holds name type size group metadata data status availability ------------ -------- ------ ------- -------- ----- ------------- ------------ gpfs1nsd nsd 512 1 yes yes ready up gpfs2nsd nsd 512 1 yes yes ready up gpfs3nsd nsd 512 1 yes yes ready up [root@storage001 root]#

After creating the file system, GPFS will add a new file system in /etc/fstab, as show in Example 7-36 on page 219.

Example 7-36: /etc/fstab file

[root@storage001 root]# less /etc/fstab ... /dev/gpfs0 /gpfs gpfs dev=/dev/gpfs0,autostart 0 0 ... [root@storage001 root]#

Mount GPFS file system

The newly created GPFS file system will not automatically be mounted when you just installed GPFS cluster. To mount GPFS file system in all nodes after creating GPFS cluster, go to the management node and run:

# dsh -av mount /gpfs

Unless you use the -A no parameter with the mmcrfs command, your GPFS file system will be mounted automatically every time you start GPFS.

Note

When trying to mount the GPFS file system in our ITSO lab environment using the mount /dev/gpfs / gpfs command, we received several kernel error messages. It may have been caused by the fact that the system does not know the file system type GPFS uses (GPFS file system).

7.8.2 GPFS nodeset with direct attached disks

The creation of the disks in an environment with direct attached disks is quite similar to the steps described for the environment with NSD servers in 7.8.1, "GPFS nodeset with NSD network attached servers" on page 213. The differences relate to how the disks will be accessed.

Defining disks

In this case, the disks will not be attached to one server only, but all disks will have a direct connection to all of the nodes through the Fibre Channel switch. Therefore, there will be no need to specify primary or secondary servers for any of the disks, and the disk description file will have the second, third, and fifth fields specified in a different way.

The primary and secondary servers fields must be left null, and the last field must indicate that there is no common point of failure with any other disk in the nodeset. This can be done by specifying a failure group of -1, as in Example 7-37 on page 220.

Example 7-37: Failure group of -1

[root@storage /tmp]# cat disk_def /dev/sda:::dataAndMetadata:-1 /dev/sdb:::dataAndMetadata:-1 /dev/sdc:::dataAndMetadata:-1 /dev/sdd:::dataAndMetadata:-1 /dev/sde:::dataAndMetadata:-1 [root@storage /tmp]#

After defining the disks you can verify them using the mmlsnsd command. This option shows all the disks for all the nodes, as in Example 7-38.

Example 7-38: mmlsnsd command

[root@node1 /root]# mmlsnsd -M NSD name PVID Device Node name Remarks --------------------------------------------------------------------------------------- gpfs1nsd C0A800E93BE1DAF6 /dev/sda node1 directly attached gpfs1nsd C0A800E93BE1DAF6 /dev/sdb node2 directly attached gpfs1nsd C0A800E93BE1DAF6 /dev/sdb node3 directly attached gpfs2nsd C0A800E93BE1DAF7 /dev/sdb node1 directly attached gpfs2nsd C0A800E93BE1DAF7 /dev/sdc node2 directly attached gpfs2nsd C0A800E93BE1DAF7 /dev/sdc node3 directly attached gpfs3nsd C0A800E93BE1DAF8 /dev/sdc node1 directly attached gpfs3nsd C0A800E93BE1DAF8 /dev/sdd node2 directly attached gpfs3nsd C0A800E93BE1DAF8 /dev/sdd node3 directly attached gpfs4nsd C0A800E93BFA7D86 /dev/sdd node1 directly attached gpfs4nsd C0A800E93BFA7D86 /dev/sde node2 directly attached gpfs4nsd C0A800E93BFA7D86 /dev/sde node3 directly attached gpfs5nsd C0A800E93BFA7D87 /dev/sde node1 directly attached gpfs5nsd C0A800E93BFA7D87 /dev/sdf node2 directly attached gpfs5nsd C0A800E93BFA7D87 /dev/sdf node3 directly attached [root@node1 /root]#

It is very important to note that it is not mandatory for the servers to have the same disk structure or amount of internal disks. The names of the disks can be different for each server. For example, in Example 7-38, you can verify that the first disk, with disk ID C0A800E93BE1DAF6, is named /dev/sda for node1, while in node2 its name is /dev/sdb.

GPFS refers to the disks using the disk ID, so you do not have to worry about the /dev/ names for the disks being different among the GPFS nodes.