UNIX Fault Management: A Guide for System Administrators

I l @ ve RuBoard

Standards for network and system management, such as the Simple Network Management Protocol (SNMP) and Desktop Management Interface (DMI), were developed to help make management easier. They provide industry-standard ways to build instrumentation and interface into the instrumentation, respectively. SNMP is used to access Management Information Bases (MIBs), and DMI is used to access Management Information Formats (MIFs).

Standard MIBs and MIFs define the metrics that can be used by any vendor's instrumentation. Vendor-specific MIBs and MIFs provide vendor-specific instrumentation. This section looks at some of the system instrumentation available through each of these standards.

Many tools already exist for accessing this instrumentation. Several vendors offer browsers and monitoring capabilities that use a common interface to access instrumented objects from different hardware platforms and operating systems. For example, the common enterprise management frameworks, such as the HP Network Node Manager, include a MIB Browser tool to access MIB data. They may also include tools that can be used to monitor MIB data on remote systems from the enterprise management platform. Toolkits are available that provide an interface with which people can write their own tools to monitor or track this information. Furthermore, toolkits exist for creating your own instrumentation.

Many valuable system resources can be monitored via these standard interfaces, to detect system events or faults. Some of the resources that you may be interested in are reviewed in this section.

SNMP

A MIB is a standard way of representing information of a certain category. For example, MIB-II provides useful information about a system, such as the number of active TCP connections, system hardware and version information, and so forth. OpenView IT/Operations (IT/O), discussed later in this chapter, provides a MIB Browser. The MIB Browser tool helps you to discover which MIBs are available, and to see the information being provided by each MIB. The MIB Browser tool can check the value of anything contained in a MIB. If you find a MIB that contains some useful fields, you can use the MIB Browser to gather that data from the target system. The resulting data is displayed in the MIB Browser's output window on the screen. By browsing through available MIBs, and by querying values of selected MIB fields, you can gather specific information needed to monitor systems and troubleshoot problems.

The SNMP interface provides access to objects stored in various MIBs. MIB-II is a standard MIB that has been implemented on most UNIX systems. On HP-UX systems, the HP-UNIX MIB defines various metrics for monitoring system resources. Other vendors, such as Sun, have vendor-specific MIBs that provide similar information. Appendix A includes complete MIB definitions.

MIB-II, the "System MIB," is a standard repository for information about a computer system, and is supported on a variety of platforms, including UNIX and Windows NT. MIB-II contains information about a computer system, such as its name , system contact, and the length of time that it has been running. It also contains statistics from the key networking protocols, such as TCP, UDP, and IP. Statistics include packet transmission counts and error counts. Table 4-1 lists several variables in MIB-II that will help you to monitor system resources effectively. Both the actual MIB variable name and a description are provided for each variable.

The HP-UNIX MIB contains important information about the users, jobs, filesystems, memory, and processes of a system. The number of users logged in to the system and number of jobs running are both indications of how busy the system is. Reduced amounts of free swap space or filesystem space can serve as warnings of potential problems. The process status can be checked to see whether a particular application is still running normally on the target system. Table 4-2 contains some of the interesting metrics from the HP-UNIX MIB for monitoring system resources.

Table 4-1. Important MIB II Fields to Monitor

MIB Variable Name Description
sysDescr System description
sysObjectID Unique identifier for the system
sysUpTime Amount of time since the last system reboot
sysContact System contact person
sysName System name
sysLocation System location
sysServices The network services performed by this system
Table 4-2. Important HP-UNIX Variables to Monitor

MIB Variable Name Description
computerSystemUsers Current number of users on the system
computerSystemAvgJobs1 Average job queue length over the last minute
computerSystemAvgJobs5 Average job queue length over the last 5 minutes
computerSystemAvgJobs15 Average job queue length over the last 15 minutes
computerSystemMaxProc Maximum number of processes allowed in the system
computerSystemFreeMemory Amount of free memory
computerSystemPhysMemory Amount of physical memory
computerSystemMaxUserMem Maximum user memory
computerSystemSwapConfig Amount of swap space configured
computerSystemEnabledSwap Amount of swap enabled via swapon
computerSystemFreeSwap Amount of free swap space
computerSystemUserCPU Amount of CPU used by users
computerSystemSysCPU Amount of CPU used by the system
computerSystemIdleCPU Amount of idle CPU

DMI

System resource information can also be retrieved by using the Desktop Management Interface (DMI), which is another standard for storing and accessing management information. Management information is represented in a text file in the Management Information Format (MIF). Management information is divided into components . Each component has a Service Provider (SP) that is responsible for providing DMI information to the management applications that request it.

Several system platforms, including HP-UX, provide instrumentation for the System MIF and the Software MIF. Appendix A contains a complete listing of these MIFs.

Similar to MIB-II, the System MIF can be used to get generic system information, such as how long it has been running, and system contact information. It includes the system name, boot time, contact information, uptime, the number of users, as well as some information about the filesystem and disks.

The Software MIF provides information about the software products and product bundles installed on the system. The Software MIF can be a useful tool after a problem with a product has been discovered . By using a MIF Browser, you can examine the Software MIF to see whether the problem might be caused by a bad patch installation or a modified file. The MIF contains revision information for each product, and its creation and modification times. Version information can be checked to see whether a compatibility problem exists. Finally, the product's vendor information is provided in case product support personnel needs to be contacted.

I l @ ve RuBoard

Категории