Quintero - Deploying Linux on IBM E-Server Pseries Clusters

 < Day Day Up > 

4.5 Linux rescue methods

You may often face problems with boot loaders, file systems, configuration files and so on. These problems may, in some cases, prevent you from booting up the system for fixing or debugging. In this section, we discuss common problems that you may encounter, and describe solutions. Figure 4-7 shows problem determination flow.

Figure 4-7. Flow chart of debugging Linux for pSeries

4.5.1 Boot loader corruption

The most common problem in Linux for pSeries that administrators face is that the system is unable to boot after installing. This could also happen if you have accidentally overwritten your boot loader. If the boot loader in the PReP partition called yaboot is corrupted, you will need to create another a new partition and reactivate it.

PReP boot loader corrupted

Before diagnosing boot loader corruption, make sure the system or LPAR boots up to the E1F1 LED panel, and proceed to load the respective boot loader from the disk.

  1. Boot from the network or from CD-ROM and load the appropriate driver that you may need for your system. Refer to 2.1.4, "Review your choices" on page 23 for the basic device drivers used by adapters and Linux on pSeries.

  2. Load the all the modules that are needed for your system to load, as shown in Figure 4-8. For example, if you have an SCSI external CD-ROM that you plan to use to boot the installation, you will need to load the driver for the SCSI card.

    Figure 4-8. Loading modules from the SuSE Installer

  3. After loading the required module, you will then select the Start Rescue System shown in Figure 4-9 on page 186, and then select the location of your kernel. You have the choice of booting from CD, network, hard disk or floppy. When boot into the rescue mode, the SLES Installer will give you a small Linux operating system located in the ramdisk . With this, you will be given a "Rescue" prompt with full root access.

    Figure 4-9. Booting into rescue mode for recovery

  4. Do a file system check on your disk:

    Rescue :/ # fsck.reiserfs /dev/<disk>

  5. Create a new partition with the command fdisk and set the type of the partition to PReP Boot (ID 41) and active boot device. The size of the partition is recommended not to exceed 8 Mb in size , as it will only contain the image that will be used to boot the system.

    Rescue :/ # fdisk /dev/<disk>

    Select option "n" and add a primary partition. If you have an existing partition, use the option "d" to delete it first. Make sure the size created is less than 8 Mb. Then use option "t" to change the boot type to 41. 41 is the boot type for PPC PReP Boot.

  6. Recreate the PReP Boot image using the dd command:

    Rescue :/ # dd if=/boot/yaboot.chrp of=/dev/<disk> bs=4k

  7. Reboot the system. If everything goes right, you should now be able to boot your system without any problem.

Tip

If you have a DHCP and NFS server, you can place the zImage.initrd.ppc64-2.4.21 kernel file into the server. The file is available from the first SLES8 CD. Set the server or LPAR to boot from this kernel image. In this way, you can rescue or boot your system even if the PReP boot partition is corrupted.

4.5.2 File system corruption

Very often, when the server did not shut down properly, the file systems or file could risk corruption. Although a journaled file system can help in many cases, it is not foolproof. There are cases where you might need to rebuild the logs and database structure.

File system corruption

  1. If you have file system corruption or configuration file corruption, you can boot the system into single user mode. If it is your root partition that is corrupted, skip this step and proceed to step 3.

    If you choose not to use yaboot for booting automatically (for example, in dual-boot systems), you should still create the PReP boot loader, but it must be not active. You can boot up your system to the openfirmware prompt and the pass the respective parameter to the yaboot prompt:

    0> boot disk

    When it reaches the yaboot prompt, key in the following:-

    yaboot : linux single console=hvc0

    Figure 4-10 on page 188 shows the diagram of booting the disk from the openfirmware prompt to the yaboot prompt.

    Figure 4-10. Boot up system into rescue from open firmware

  2. Running file system check on the file system:

    (none):~ # fsck.reiserfs /dev/<disk>

  3. If this did not fix your problem, you will need to use the first CD1 from the SLES and boot into rescue mode. After boot into rescue mode, rerun the fsck.reiserfs command. If you are using other file systems, the command will differ .

    After that, create a new mount point in the rescue system and then mount your file systems over it.

  4. Mount the root file system into a mount point:

    Rescue:/ # mount -t reiserfs /dev/sda2 /mnt/<mount_point>

  5. If necessay, you can modify and update /etc/fstab accordingly so that it can boot up correctly.

Should you need to reset the root password, you can change the root directory and provide the root password accordingly.

Rescue:/ # chroot /mnt/root Rescue:/ # passwd root

4.5.3 RHAS 3 rescue mode

For RHAS 3, you can use the installation disk in rescue mode to provide quick access to your disk partition to perform recovery and changes for your corrupted Linux system. To boot up into rescue mode, boot up the CD-ROM until the yaboot prompt:

yaboot : linux rescue

If you do not have a CD-ROM attached to the system, you can boot up the system into open firmware and run the following. You also press the key 8 when the LED shows E1F1; this will get you to the open firmware prompt as well.

0 > boot net rescue

Once the kernel is loaded, select the language and the location of the rescue image. Then the installation program will attempt to mount the disk partition on your system. It will presents you with a shell prompt, where you can perform the necessary rescue methods. To exit, type: exit 0 ; this will automatically reboot the system.

Refer to 2.3.3, "Unattended installation" on page 67 for information about how to set up the network boot.

4.5.4 Using /proc file systems

The /proc file system in Linux provides real-time information about the kernel and the hardware devices that are present in the server. Some of these are read-only and others are read-write which allows you to modify or tune the hardware for better performance. Refer to "File system tuning" on page 208 for information on how to tune your system using /proc.

Some commonly used commands on Linux are listed in Table 4-4 on page 190.

Table 4-4. Commands used on Linux

# procinfo

Provides a brief overview of the system.

# cat /proc/ cpuinfo

Display CPU information, clock speed.

# cat /proc/ppc64/lparcfg

Display a snapshot of the current LPAR configuration; this is useful for the server attached to HMC.

# cat /proc/ppc64/lparcfg serial_number=IBM,xxxxxxxx system_type=IBM,7038-6M2 partition_id=1 system_active_processors=2 system_potential_processors=2 partition_active_processors=2 partition_potential_processors=2 partition_entitled_capacity=200 partition_max_entitled_capacity=400 shared_processor_mode=0

# hwscan --list

Display the list system devices and adapters with classification of the type of device.

# hwinfo

Display detailed output of the complete hardware that is currently in the server. This is very useful for administrators trying to analyze and debug hardware problems.

In addition to displaying details about the devices in /proc, you can also use /proc to remove and add SCSI devices on the fly. To list the SCSI devices you have on your system, run the command cat /proc/scsi/scsi . Example 4-14 shows the output of the command.

Example 4-14. Display SCSI devices from /proc

lpar7:/proc/scsi # cat /proc/scsi/scsi Attached devices: Host: scsi0 Channel: 00 Id: 01 Lun: 00 Vendor: IBM Model: CDRM00203 !K Rev: 1_06 Type: CD-ROM ANSI SCSI revision: 02 Host: scsi0 Channel: 00 Id: 08 Lun: 00 Vendor: IBM Model: IC35L146UCDY10-0 Rev: S25F Type: Direct-Access ANSI SCSI revision: 03 Host: scsi0 Channel: 00 Id: 09 Lun: 00 Vendor: IBM Model: IC35L146UCDY10-0 Rev: S25F Type: Direct-Access ANSI SCSI revision: 03 Host: scsi0 Channel: 00 Id: 14 Lun: 00 Vendor: IBM Model: HSBPM2 PU2SCSI Rev: 0015 Type: Enclosure ANSI SCSI revision: 02 Host: scsi0 Channel: 00 Id: 15 Lun: 00 Vendor: IBM Model: HSBPD4M PU3SCSI Rev: 0015 Type: Enclosure ANSI SCSI revision: 02

You have a number of attached devices after the output shown in Example 4-14. The first line describes the how the hardware are being connected, followed by the vendor and the type of device. Existing devices can be removed using the command echo "scsi remove-single-device <h> <b> <t> <l>" > /proc/scsi/scsi where <h> is the host adapter, <b> for channel id, <t> for scsi target id and <l> for lun. After this, run the command cat /proc/scsi/scsi to see if the remove was successful. Example 4-15 shows removing a single scsi disk ( /dev/sdb ).

Example 4-15. Removing a single device in Linux

lpar7:/proc/scsi # fdisk -l Disk /dev/sda: 255 heads, 63 sectors, 17849 cylinders Units = cylinders of 16065 * 512 bytes Device Boot Start End Blocks Id System /dev/sda1 * 1 1 8001 41 PPC PReP Boot /dev/sda3 15 537 4200997+ 83 Linux /dev/sda4 538 17848 139050607+ 5 Extended /dev/sda5 538 799 2104483+ 82 Linux swap /dev/sda6 800 17848 136946061 fd Linux raid autodetect Disk /dev/sdb: 255 heads, 63 sectors, 17849 cylinders Units = cylinders of 16065 * 512 bytes Device Boot Start End Blocks Id System lpar7:/proc/scsi # echo "scsi remove-single-device 0 0 9 0" > /proc/scsi/scsi lpar7:/proc/scsi # fdisk -l Disk /dev/sda: 255 heads, 63 sectors, 17849 cylinders Units = cylinders of 16065 * 512 bytes Device Boot Start End Blocks Id System /dev/sda1 * 1 1 8001 41 PPC PReP Boot /dev/sda3 15 537 4200997+ 83 Linux /dev/sda4 538 17848 139050607+ 5 Extended /dev/sda5 538 799 2104483+ 82 Linux swap /dev/sda6 800 17848 136946061 fd Linux raid autodetect

You can also add a new SCSI device by using the command echo "scsi add-single-device <h> <b> <t> <l>"> /proc/scsi. In Example 4-16, we add the SCSI disk we removed in Example 4-15 back to the system.

Example 4-16. Adding a SCSI disk

lpar7:/proc/scsi # echo "scsi add-single-device 0 0 9 0" > /proc/scsi/scsi lpar7:/proc/scsi # fdisk -l Disk /dev/sda: 255 heads, 63 sectors, 17849 cylinders Units = cylinders of 16065 * 512 bytes Device Boot Start End Blocks Id System /dev/sda1 * 1 1 8001 41 PPC PReP Boot /dev/sda3 15 537 4200997+ 83 Linux /dev/sda4 538 17848 139050607+ 5 Extended /dev/sda5 538 799 2104483+ 82 Linux swap /dev/sda6 800 17848 136946061 fd Linux raid autodetect Disk /dev/sdb: 255 heads, 63 sectors, 17849 cylinders Units = cylinders of 16065 * 512 bytes Device Boot Start End Blocks Id System

 < Day Day Up > 

Категории