MicrosoftВ® Office SharePointВ® Server 2007 Administrators Companion

The first required task is to understand what subsystems are critical to scrutinize. Begin with one or more of the following subsystems:

Weaknesses in each of these areas can generate a significant bottleneck in one or more combinations of other areas. Looking at each one in detail will help determine what sort of impact might be incurred.

Monitoring Processor Utilization

The processor is the most obvious component that is critical to the performance of the system. But with a long list of potential counters, you need to pare down what is important to monitor and define the requirement for doing so. There are multiple counters that can be monitored for potential CPU bottlenecks, but the following three cover the majority of issues:

Note 

When we refer to an object, counter, or instance in this chapter, the format will be as follows: Object\Counter\Instance.

Monitoring Memory Utilization

In many cases, system administrators are tempted to "throw memory" at the problem. This can work in the short term, but a correctly diagnosed problem will help you to avoid spending potentially thousands of dollars without actually resolving the issue. Monitoring memory counters can reap significant rewards.

Monitoring Disk Utilization

There are two types of counters for disk: physical and logical. Physical disk refers to a disk without regard for grouping configurations, such as a concatenation of disks or RAID sets. Logical disk counters report only on the activity of the logical disk in a grouping. A great deal of performance benefit can be gained by tracking down and resolving disk issues. Because even the newest modern hard drives are orders of magnitude slower than memory or processor, even small gains will return large rewards. Note that if you are focusing your monitoring on disk-related issues, you should log your data to another server to ensure you are not adding load to your disk subsystem.

The counters just listed for physical drives pertain to logical disk as well, and in the same manner. Differences occur with RAID sets and dynamic disks. With a RAID set, it is possible to have greater than 100 % Disk Time. Use the Avg Disk Queue Length counter to determine the requests pending for the disks. When dynamic disks are in use, logical counters are removed. When you have a dynamic volume with more than one physical disk, instances will be listed as 'Disk 0 C:', 'Disk 1 C:', 'Disk 0 D:', and so on. In situations where you have multiple volumes on a single drive, instances will be listed as '0 C: D:'.

Storage Area Network Disk Monitoring

There are differences when monitoring disks on a Storage Area Network (SAN). A SAN is different than a physical disk in that you must be concerned with how many disks make up the logical unit number (LUN). Your SAN administrator will be able to provide that information. Most SANs will return a value to the Performance tool as if a physical disk is being monitored. This number is inaccurate because it is the additive value of all the disks. To determine the correct value, divide the Performance tool result by the number of disks in the LUN. Typically, physical disk counters and logical disk counters will return the same value on a SAN. It is a good idea to check with your SAN team before you start using the Performance tool, as tools specifically written for the SAN hardware generally give better information. It is likely this data will not be available in a usable format, and this is where the Performance tool can be very useful.

Monitoring Network Utilization

Many companies employ server administrators who must wear multiple hats. It is not uncommon for the person who maintains servers to also maintain personal computers and the network. Windows Server exposes some very good counters for helping to track network-related issues. If you must play the role of the network engineer in a smaller company, be aware that there are a multitude of helpful counters. In large companies with distinct network and server teams, these counters can be invaluable in coordinating with other groups to resolve complex challenges.

In most modern servers, the network card has a processor to handle the moving and encoding of network traffic. However, you might still administer systems that do not have server-level network cards. It is important to monitor processor and memory along with network statistics to determine the root cause of problems that arise. Unlike other counters previously covered in this chapter, network monitoring is done at different layers of the OSI model ranging from the Data-link layer up to the Presentation layer. Because most companies use Ethernet as the network medium and TCP/IP as the protocol, that will be the focus of this section. The TCP/IP layer model maps directly to the OSI model. All layers are monitored with different counters due to the unique nature of each.

More Info 

For more information on how TCP/IP is implemented on Microsoft Windows platforms, see the online book titled TCP/IP Fundamentals for Microsoft Windows found at http://www.microsoft.com/technet/itsolutions/network/evaluate/technol/tcpipfund/tcpipfund.mspx. For a map of the TCP/IP and OSI models, go to the following Web site: http://www.microsoft.com/library/media/1033/technet/images/itsolutions/network/evaluate/technol/tcpipfund/caop0201_big.gif.

Monitoring at the Data-Link Layer

The data-link layer is the bottom layer in the TCP/IP protocol stack. Even though the processes within the layer are dependent on the physical medium (Ethernet, SONET, ATM, and so on) and devices drivers, the information is passed on to the TCP/IP stack. It is crucial that you monitor these counters when exploring network-related bottlenecks.

A good rule of thumb for maximum expected throughput is ((Network Card Speed x 2) / 8) x 75%. Most network-use switches allow for full duplex (sending and receiving at the same time), which is why the speed is doubled in the formula. Divide the result by 8 to get the speed measured in bytes. The reason for only 75 percent of the listed speed is due to TCP/IP's and Ethernet's error checking and packet assembly/disassembly. For a 100-Mbit Ethernet card, you can expect a maximum throughput of 18.75 megabytes (MBs) per second. If applications or users are experiencing slow data-transfer speeds, confirm that your network cards are set to full duplex if you are in a switched environment. If you are not sure, set the card to auto-detect duplex or ask your network administrator for their requirements.

Monitoring at the Network Layer

This is the first layer that is independent of the physical medium. The network layer handles the routing of packets across a heterogeneous network. When referring to the OSI network model, this layer and its functions are referred to as layer 3.

Monitoring at the Transport Layer

The transport layer is responsible for ensuring packets arrive intact or are rtransmitted, congestion control, and packet ordering. This layer does a lot of the heavy work with regard to the network. Many of the problems with network issues can occur here, and therefore, this is one of the most critical layers to monitor.

Monitoring at the Presentation Layer

This layer ensures that the information from the network layer is available in the correct format to the system. It ensures translation and encryption or decryption is performed before the data is passed.

There are two types of counters under this heading to be concerned with: server and redirector. The server object is specifically for monitoring the server or the machines serving the information. The redirector is used when monitoring client machines. Either or none of these machines could be a server in the hardware sense, but this refers to how they interact with each other in the client-server paradigm.

Note 

It is helpful to involve networking staff when tracking down possible network issues. Network engineers understand the network and are familiar with how it should respond. Be cautious when monitoring network counters without the cooperation of the network team.

Baselining Your SharePoint Server 2007 Install

There are many references to baselining in this chapter, but what exactly does it mean? Baselining means recording performance statistics for a relevant set of counters during normal usage times. Normal usage times should be during regular operating hours and off-peak times. Gathering statistics during timeframes with heavy, light, and no usage will help define what is normal for an individual system. There are quite a few options for you to choose from when monitoring your front-end SharePoint servers, but the most important ones are listed in Table 13-1.

Table 13-1: Monitoring Options

Open table as spreadsheet

Object\Counter

Threshold/Description

Processor\% Processor Time\_Total

<75%

System\Processor Queue Length\(N/A)

<# of CPUs x 2

Memory\Available Mbytes\(N\A)

< 80%

Memory\Pages/sec\(N\A)

<100

PhysicalDisk\% Disk Time\DataDrive

<80%

PhysicalDisk\Current Disk Queue Length\DataDrive

< #of Disks x 2

ASP.NET Applications\Requests/sec\_Total

There is no hard limit. Determine this total by baselining.

ASP.NET\Worker Processes Restarts

Any number above zero can indicate that problems exist.

.NET CLR Memory\% Time in GC

Time spent on garbage collection. Thresholds depend on many factors, but a value over 25% could indicate there are too many unreachable objects.

Категории