HP-UX Virtual Partitions

   

HP-UX Virtual Partitions

By Marty Poniatowski

Table of Contents
Chapter 12.  Performance Topics

HP GlancePlus/UX

Using UNIX commands to get a better understanding of what your system is doing requires you to do a lot of work. In the first case, issuing UNIX commands gives you the advantage of obtaining data about what is taking place on your system that very second. Unfortunately, you can't always issue additional commands to probe more deeply into an area, such as a process, about which you want to know more.

Now I'll describe another technique, a tool that can help get useful data in real time, allow you to investigate a specific process, and not bury you in reports. This tool is HP GlancePlus/UX (GlancePlus). This tool runs on several UNIX variants, including Solaris, HP-UX, and AIX.

GlancePlus can be run in character mode or in graphic mode. I chose to use the character-based version of GlancePlus, because this will run on any display, either graphics- or character-based, and the many colors used by the Motif version of GlancePlus do not show up well in a book. My examples are displayed much more clearly in the book when using the character mode. I recommend that you try both versions of GlancePlus to see which you prefer.

The system used in the examples has eight processors, 4 GBytes of RAM, and a substantial amount of EMC Symmetrix disk connected to it.

Figure 12-4 shows one of several interactive screens of GlancePlus. This one is the Process List screen, also referred to as the Global screen. This is the default screen when bringing up GlancePlus.

Figure 12-4. HP GlancePlus/UX Process List Screen Shot

Two features of the screen shown in Figure 12-4 are worth noticing immediately:

  1. Four histograms at the top of the screen give you a graphical representation of your CPU, Disk, Memory, and Swap Utilization in a format much easier to assimilate than a column of numbers.

  2. The "Process Summary" has columns similar to ps -ef, with which many system administrators are familiar and comfortable. GlancePlus, however, gives you the additional capability of filtering out processes that are using very few resources by specifying thresholds.

Using GlancePlus, you can take a close look at your system in many areas, including the following:

  • Process List

  • CPU Report

  • Memory Report

  • Swap Space

  • Disk Report

  • LAN Detail

  • NFS by System

  • PRM Summary (Process Resource Manager)

  • I/O by File System

  • I/O by Disk

  • I/O by Logical Volume

  • System Tables

Figure 12-4 is a GlancePlus screen shot.

Because the Process List shown in the example tells you where your system resources are going at the highest level, I'll start my description here. I am using a terminal emulator on my portable computer to display GlancePlus. I find that many system administrators use a PC and a terminal emulator to perform UNIX management functions. Keep in mind that the information shown on this screen can be updated at any interval you choose. If your system is running in a steady-state mode, you may want to have a long interval because you don't expect things to much change. On the other hand, you may have a dynamic environment and want to see the histograms and other information updated every few seconds. In either case, you can change the update interval to suit your needs. You can use the function keys at the bottom of the screen to go into other functional areas.

Process List Description

The Process List screen provides an overview of the state of system resources and active processes.

The top section of the screen (the histogram section) is common to the many screens of GlancePlus. The bottom section of the screen displays a summary of active processes.

Line 1 provides the product and version number of GlancePlus, the time, name of your system, and system type. In this case, we are running version 11.01 of GlancePlus.

Line 3 provides information about the overall state of the CPU. This tends to be the single most important piece of information that administrators want to know about their system - Is my CPU overworked?

The CPU Utilization bar is divided into the following parts:

  1. "S" indicates the amount of time spent on "system" activities such as context switching and system calls.

  2. "N" indicates the amount of time spent running "nice" user processes (those run at a low priority).

  3. "U" indicates the amount of time spent running user processes.

  4. "R" indicates real-time processes.

  5. "A" indicates the amount of time spent running processes at a negative "nice" priority.

The far right of line 3 shows the percentage of CPU utilization. If your system is "CPU-Bound," you will consistently see this number near 100 percent. You get statistics for Current, Average (since analysis was begun), and High.

Line 4 shows Disk Utilization for the busiest mounted disk. This bar indicates the percentage of File System and Virtual Memory disk I/O over the update interval. This bar is divided into two parts:

  1. "F" indicates the amount of file system activity of user reads and writes and other non-paging activities.

  2. "V" indicates the percentage of disk I/O devoted to paging virtual memory.

The Current, Avg, and High statistics have the same meaning as in the CPU Utilization description.

Line 5 shows the system memory utilization. This bar is divided into three parts:

  1. "S" indicates the amount of memory devoted to system use.

  2. "U" indicates the amount of memory devoted to user programs and data.

  3. "B" indicates the amount of memory devoted to buffer cache.

The Current, Avg, and High statistics have the same meaning as in the CPU Utilization description.

Line 6 shows Swap Util information, which is divided into two parts:

  1. "R" indicates reserved, but not in use.

  2. "U" indicates swap space in use.

All three of these areas (CPU, Memory, and Disk) may be further analyzed by using the F2, F3, and F4 function keys, respectively. Again, you may see different function keys, depending on the version of GlancePlus you are running. When you select one of these keys, you move from the Process List screen to a screen that provides more in-depth functions in the selected area. In addition, more detailed screens are available for many other system areas. Because most investigation beyond the Process List screen takes place on the CPU, Memory, and Disk screens, I'll describe these in more detail shortly.

The bottom of the Process List screen shows the active processes running on your system. Because there are typically many processes running on a UNIX system, you may want to consider using the o command to set a threshold for CPU utilization. If you set a threshold of five percent, for instance, then only processes that exceed the average CPU utilization of five percent over the interval will be displayed. There are other types of thresholds that can be specified, such as the amount of RAM used (Resident Size). If you specify thresholds, you see only the processes you're most interested in, that is, those consuming the greatest system resources.

There is a line for each active process that meets the threshold requirements you defined. There may be more than one page of processes to display. The message in the bottom-right corner of the screen indicates which page you are on. You can scroll forward to view the next page with f and backward with b. Usually, only a few processes consume most of your system resources, so I recommend setting the thresholds so that only one page of processes is displayed. There are a whole series of commands you can issue in GlancePlus. The final figure in this section shows the commands recognized by GlancePlus.

Here is a brief summary of the process headings:

Process Name

The name or abbreviation used to load the executable program.

PID

The process identification number.

PPID

The PID of the parent process.

Pri

The priority of the process. The lower the number, the higher the priority. System-level processes usually run between 0 and 127. Other processes usually run between 128 and 255. "Nice" processes are those with the lowest priority and they have the largest number.

User Name

Name of the user who started the process.

CPU Util

The first number is the percentage of CPU utilization that this process consumed over the update interval. Note that this is 800% maximum for our eight-processor system. The second number is the percentage of CPU utilization that this process consumed since GlancePlus was invoked. Most system administrators leave GlancePlus running continuously on their systems with a low update interval. Since GlancePlus uses very little system overhead, there is virtually no penalty for this.

Cum CPU

The total CPU time used by the process. GlancePlus uses the "midaemon" to gather information. If the midaemon started before the process, you will get an accurate measure of cumulative CPU time used by the process.

Disk IO Rate

The first number is the average disk I/O rate per second over the last update interval. The second number is the average disk I/O rate since GlancePlus was started or since the process was started. Disk I/O can mean a lot of different things. Disk I/O could mean taking blocks of data off the disk for the first time and putting them in RAM, or it could be entirely paging and swapping. Some processes will simply require a lot more Disk I/O than others. When this number is very high, however, take a close look at whether or not you have enough RAM. Keep in mind that pageout activity, such as deactivation and swapping, are attributed to the vhand process.

RSS Size

The amount of RAM in KBytes that is consumed by the process. This is called the Resident Size. Everything related to the process that is in RAM is included in this column, such as the process's data, stack, text, and shared memory segments. This is a good column to inspect. Because slow systems are often erroneously assumed to be CPU-bound, I always make a point of looking at this column to identify the amount of RAM that the primary applications are using. This is often revealing. Some applications use a small amount of RAM but use large data sets, a point often overlooked when RAM calculations are made. This column shows all the RAM your process is currently using.

Block On

The reason the process was blocked (unable to run). If the process is currently blocked, you will see why. If the process is running, you will see why it was last blocked. There are many reasons a process could be blocked. After Thd Cnt is a list of the most common reasons for the process being blocked.

Thd Cnt

The total number of threads for this current process.

Abbreviation

Reason for the Blocked Process

CACHE

Waiting for a cache buffer to become available

DISK

Waiting for a disk operation to complete

INODE

Waiting for an inode operation to complete

IO

Waiting for a non-disk I/O to complete

IPC

Waiting for a shared memory operation to complete

LAN

Waiting for a LAN operation to complete

MESG

Waiting for a message queue operation to complete

NFS

Waiting for an NFS request to complete

PIPE

Waiting for data to or from a pipe

PRI

Waiting because a higher-priority process is running

RFA

Waiting for a Remote File Access to complete

SEM

Waiting for a semaphore to become available

SLEEP

Waiting because the process called sleep or wait

SOCKT

Waiting for a socket operation to complete

SYS

Waiting for system resources

TERM

Waiting for a terminal transfer

VM

Waiting for a virtual memory operation to complete

OTHER

Waiting for a reason GlancePlus can't determine

CPU Report Screen Description

If the Process List screen indicates that the CPU is overworked, you'll want to refer to the CPU Report screen shown in Figure 12-5. It can provide useful information about the seven types of states on which GlancePlus reports.

Figure 12-5. HP GlancePlus/UX CPU Report Screen Shot

For each of the seven types of states, there are columns that provide additional information. Following is a description of the columns:

Current

Displays the percentage of CPU time devoted to this state over the last time interval.

Average

Displays the average percentage of CPU time spent in this state since GlancePlus was started.

High

Displays the highest percentage of CPU time devoted to this state since GlancePlus was started.

Time

Displays the CPU time spent in this state over the last interval.

Cum Time

Displays the total amount of CPU time spent in this state since GlancePlus was started.

A description of the seven states follows:

User

CPU time spent executing user activities under normal priority.

Nice

CPU time spent running user code in nice mode.

Negative Nice

CPU time spent running code at a high priority.

Realtime

CPU time spent executing real-time processes that run at a high priority.

System

CPU time spent executing system calls and programs.

Interrupt

CPU time spent executing system interrupts. A high value here may indicate a lot of I/O, such as paging and swapping.

ContSwitch

CPU time spent context switching between processes.

Traps

CPU time spent handling traps.

Vfaults

CPU time spent handling page faults.

Idle

CPU time spent idle.

The CPU Report screen also shows your system's run queue length or load average. This is displayed on the second page of the CPU Report screen. The Current, Average, and High values for the number of runnable processes waiting for the CPU are shown. You may want to get a gauge of your system's run queue length when the system is mostly idle and compare these numbers with those you see when your system is in normal use.

The final area reported on the CPU Report screen is load average, system calls, interrupts, and context switches. I don't inspect these too closely, because if one of these is high, it is normally the symptom of a problem and not the cause of a problem. If you correct a problem, you will see these numbers reduced.

You can use GlancePlus to view all the CPUs in your system, as shown in Figure 12-6. This is an eight-processor system.

Figure 12-6. All CPUs Screen in GlancePlus

Memory Report Screen Description

The Memory Report Screen, shown in Figure 12-7, provides information on several types of memory management events. The statistics shown are in the form of counts, not percentages. You may want to look at these counts for a mostly idle system and then observe what takes place as the load on the system is incrementally increased. My experience has been that many more memory bottlenecks occur than CPU bottlenecks, so you may find this screen revealing.

Figure 12-7. HP GlancePlus/UX Memory Report Screen Shot

The following five statistics are shown for each memory management event:

Current

The number of times an event occurred in the last interval. The count changes if you update the interval, so you may want to select an interval you are comfortable with and stick with it.

Cumulative

The sum of all counts for this event since GlancePlus was started.

Current Rate

The number of events per second.

Cum Rate

Average of the rate over the cummulative collection interval.

High Rate

The highest rate recorded.

Following are brief descriptions of the memory management events for which statistics are provided:

Page Faults

Any address translation fault, such as reclaims, pid faults, and so on.

Page In/Page Out

Pages of data moved from virtual memory (disk) to physical memory (page in), or vice versa.

KB Paged In

The amount of data paged in because of page faults.

KB Paged Out

The amount of data paged out to disk.

Reactivations/Deactivations

The number of processes swapped in and out of memory. A system low on RAM will spend a lot of time swapping processes in and out of RAM. If a lot of this type of swapping is taking place, you may see high CPU utilization and some other statistics may increase as well. These may only be symptoms that a lot of swapping is taking place.

KB Reactivated

The amount of information swapped into RAM as a result of processes having been swapped out earlier due to insufficient RAM.

KB Deactivated

The amount of information swapped out when processes are moved to disk.

VM Reads

The total count of the number of virtual memory reads to disk. The higher this number, the more often your system is going to disk.

VM Writes

The total count of memory management I/O.

The following values are also on the Memory screen:

Total VM

The amount of total virtual memory used by all processes.

Active VM

The amount of virtual memory used by all active processes.

Sys Mem

The amount of memory devoted to system use.

Buf Cache Size

The current size of buffer cache.

User Mem

The amount of memory devoted to user use.

Free Memory

The amount of RAM not currently allocated for use.

Phys Memory

The total RAM in your system.

This screen gives you a lot of information about how your memory subsystem is being used. You may want to view some statistics when your system is mostly idle and when it is heavily used and compare the two. Some good numbers to record are "Free Memory" (to see whether you have any free RAM under either condition) and "Total VM" (to see how much virtual memory has been allocated for all your processes). A system that is RAM-rich will have available memory; a system that is RAM-poor will allocate a lot of virtual memory.

Disk Report Screen Description

The Disk Report screen appears in Figure 12-8. You may see groupings of "local" and "remote" information.

Figure 12-8. HP GlancePlus/UX Disk Report Screen Shot

There are eight disk statistics provided for eight events related to logical and physical accesses to all the disks mounted on the local system. These events represent all the disk activity taking place on the system.

Here are descriptions of the eight disk statistics provided:

Requests

The total number of requests of that type over the last interval.

%

The percentage of this type of disk event relative to other types.

Rate

The average number of requests of this type per second.

Bytes

The total number of bytes transferred for this event over the last interval.

Cum Req

The cumulative number of requests since GlancePlus started.

%

The relative percentage of this type of disk event since GlancePlus started.

Cum Rate

Average of the rate over the cumulative collection interval.

Cum Bytes

The total number of bytes transferred for this type of event since GlancePlus started.

Next are descriptions of the disk events for which these statistics are provided, which may be listed under "Local" on your system:

Logl Rds and Logl Wts

The number of logical reads and writes to a disk. Because disks normally use memory buffer cache, a logical read may not require physical access to the disk.

Phys Rds

The number of physical reads to the disk. These physical reads may be due to either file system logical reads or to virtual memory management.

Phys Wts

The number of physical writes to the disk. This may be due to file system activity or virtual memory management.

User

The amount of physical disk I/O as a result of user file I/O operations.

Virtual Mem

The amount of physical disk I/O as a result of virtual memory management activity.

System

Housekeeping I/O such as inode updates.

Raw

The amount of raw mode disk I/O.

A lot of disk activity may also take place as a result of NFS mounted disks. Statistics are provided for "Remote" disks as well.

Disk access is required on all systems. The question to ask is: What disk activity is unnecessary and slowing down my system? A good place to start is to compare the amount of "User" disk I/O with "Virtual Mem" disk I/ O. If your system is performing much more virtual memory I/O than user I/ O, you may want to investigate your memory needs.

GlancePlus Summary

In addition to the Process List, or Global, screen and the CPU, Memory, and Disk screens described earlier, there are many other useful screens, including the following:

Swap Space

Shows details of all swap areas. May be called by another name in other releases.

Netwk By Intrface

Gives details about each LAN card configured on your system. This screen may have another name in other releases.

NFS Global

Provides details on inbound and outbound NFS-mounted file systems. May be called by another name in other releases.

Select Process

Allows you to select a single process to investigate. May be called by another name in other releases.

I/O By File Sys

Shows details of I/O for each mounted disk partition.

I/O By Disk

Shows details of I/O for each mounted disk.

I/O By Logl Vol

Shows details of I/O for each mounted logical volume.

System Tables

Shows details of internal system tables.

Process Threshold

Defines which processes will be displayed on the Process List screen. May be called by another name, such as the Global screen, in other releases.

As you can see, although I described the four most commonly used screens in detail, you can use many others to investigate your system further.

There are also many commands that you can issue within GlancePlus. Figures 12-9 and 12-10 show the Command List screens in GlancePlus.

Figure 12-9. HP GlancePlus/UX Command List Screen 1

Figure 12-10. HP GlancePlus/UX Command List Screen 2

Using VantagePoint Performance Agent to Identify Bottlenecks

VantagePoint Performance Agent allows you to view many metrics related to system performance that can help you identify the source of bottlenecks in your system. You can use the graphical version of GlancePlus, called gpm, to specify the metrics you want to keep track of. You can then view them in the gpm interface and sort them a variety of different ways.

The following are the most important types of bottlenecks you can encounter on a system and the metrics associated with each type of bottle-neck. This information was provided by Doug Grumann and Stephen Ciullo of Hewlett-Packard, who are two performance experts.

  1. CPU bottleneck Using VantagePoint Performance Agent:

    • Consistent High global CPU utilization with GBL_CPU_TOTAL_UTIL>90% and next bullet.

    • Significant Run Queue or Load Average indicated by GBL_PRI_QUEUE or GBL_RUN_QUEUE>3.

    • Look for processes blocked on priority with PROC_STOP_REASON=PRI.

  2. System CPU bottleneck using VantagePoint Performance Agent (same as 1 with addition of first bullet):

    • Most of the CPU time spent in kernel mode with GBL_CPU_SYS_MODE_UTIL>50%.

    • Consistent High global CPU utilization with GBL_CPU_TOTAL_UTIL>90% and next bullet.

    • Significant Run Queue or Load Average indicated by GBL_PRI_QUEUE or GBL_RUN_QUEUE>3.

    • Look for processes blocked on priority with PROC_STOP_REASON=PRI.

  3. Context switching bottleneck using VantagePoint Performance Agent (same as 2 with addition of first bullet):

    • Significant CPU time spent switching with GBL_CPU_CSWITCH>30%.

    • Most of the CPU time spend in kernel mode with GBL_CPU_SYS_MODE_UTIL>50%.

    • Consistent High global CPU utilization with GBL_CPU_TOTAL_UTIL>90% and next bullet.

    • Significant Run Queue or Load Average indicated by GBL_PRI_QUEUE or GBL_RUN_QUEUE>3.

    • Look for processes blocked on priority with PROC_STOP_REASON=PRI.

  4. User CPU bottleneck Using VantagePoint Performance Agent (same as 1 with addition of first bullet):

    • Most of the CPU time spent in user mode with GBL_CPU_USER_MODE_UTIL>50%.

    • Consistent High global CPU utilization with GBL_CPU_TOTAL_UTIL>90% and next bullet.

    • Significant Run Queue or Load Average indicated by GBL_PRI_QUEUE or GBL_RUN_QUEUE>3.

    • Look for processes blocked on priority with PROC_STOP_REASON=PRI.

  5. Disk bottleneck Using VantagePoint Performance Agent:

    • At least one disk device with consistently high utilization with BYDSK_UTIL>50%.

    • Queue lengths greater than zero with BYDSK_QUEUE>0.

    • Processes or threads blocked on I/O for a variety of reasons with PROC_STOP_REASON=CACHE, DISK or IO.

    • Look for processes blocked on priority with PROC_STOP_REASON=PRI.

  6. Buffer Cache bottleneck Using VantagePoint Performance Agent:

    • Moderate utilization of at least one disk with BYDSK_UTIL>25%.

    • Queue lengths greater than zero with BYDSK_QUEUE>0.

    • Low Buffer cache read hit percentage with GBL_MEM_CACHE_HIT_PCT<90%.

    • Processes or threads blocked on cache with PROC_STOP_REASON=CACHE.

  7. Memory bottleneck Using VantagePoint Performance Agent:

    • High physical memory utilization with GBL_MEM_UTIL>95%.

    • Significant pageouts or any deactivations with GBL_MEM_PAGEOUT_RATE>1 or GBL_MEM_SWAPOUT_RATE>0.

    • vhand processes consistently active with vhand's PROC_CPU_TOTAL_UTIL>5%.

    • Processes or threads blocked on virtual memory with PROC_STOP_REASON=VM.

  8. Networking bottleneck Using VantagePoint Performance Agent:

    • High network packet rates with GBL_NET_PACKET_RATE>2 average. Keep in mind that this varies greatly depending on configuration.

    • Any output queuing taking place with GBL_NET_OUTQUEUE>0.

    • Higher than normal number of processes or threads blocked on networking with PROC_STOP_REASON=NFS, LAN, RPC or SOCKET GBL_NETWORK_SUBSYSTEM_QUEUE>average.

    • One CPU with a high system mode CPU utilization while other CPUs are mostly idle with BYCPU_CPU_INTERRUPT_TIME>30.

    • Using lanadmin, check for frequent incrementing of Outbound Discards or excessive Collisions.

In order to identify a problem on your system, you must first characterize your system when it is running smoothly and has no problems. Should your system start to perform poorly in some respect or another, you can compare the performance data of a smoothly running system to one with potential problems.


       
    Top
     

    Категории