Monitoring and Managing Microsoft Exchange Server 2003 (HP Technologies)

 < Day Day Up > 


Effective management of the Exchange environment requires disciplined monitoring and reporting. Metrics are the standards by which the quality of service is measured. Reporting includes the generation, distribution, and review of the measurement data to the appropriate audience. Depending on the type of report, reports may be used to summarize the measurement data, to illustrate trends, to correlate multiple metrics, or to compare measurements to past or desirable values.

Measurement of service quality is a key component in most of the processes involved in the administration of any professionally managed messaging implementation. Metrics and reporting can help the operations group-and the user community-understand the current performance of the e-mail environment and how it is being used. Metrics can identify system performance trends and changes in usage patterns. They are the primary way to validate that the e-mail system is providing the level of service specified in SLAs. After changes have been made to key system components, metrics can validate that the system continues to perform as expected. Changes in key metrics can help determine when upgrades are needed to key components, such as the number of processors, CPU speed, memory, disk space, and network bandwidth. Operations reports generally fall into one of the categories described in the following sections. Later chapters of this book will help to explain where to collect the information for these types of reports.

2.9.1 Use and capacity reports

A use and capacity report supplies the data for analyzing the long-term changes in system and network usage. By tracking these use and capacity changes, it is possible to predict when system components will need to be upgraded.

Hardcopy use and capacity reports should be published and reviewed monthly. Typical report data should include metrics, such as the following:

2.9.2 Usage reports

Usage reports are designed to show how heavily the messaging system is being used and which users are using the most resources, such as disk space and network bandwidth. As with the use and capacity reports, tracking the usage changes over time will help to identify when resources will need to be increased.

Usage reports should be published to the intranet on a weekly basis. Summaries could be published for quarterly review meetings. Typical report data should include metrics, such as the following:

2.9.3 System health snapshots

System health snapshots are typically brief summaries that report the current performance level and recent behavior of the system. The primary purpose of these snapshots is to verify that the system is operating as expected. They can also be used to detect changes in performance because of problems, resource depletion, increased or decreased usage, or problems with underlying components such as the network.

System health snapshots should be published to the intranet each day. The operations group should carefully and religiously review these reports checking for changes in performance that might be early indicators of a problem. Typical report data should include metrics, such as the following:

2.9.4 Service level agreement compliance reports

SLA compliance reports are designed to monitor the messaging system's compliance with the SLAs that the operations group has established with the user community. Similar reports also may be used to monitor system performance against internal organizational service targets.

These reports will be used as a communication mechanism between the operations group and the user community. The publication schedule, publication mechanism, and metrics for these reports should be negotiated with the user groups as part of the SLAs. Potential metrics may include availability (percentage uptime during service window), reliability (percentage of correctly addressed messages that are successfully delivered), message delivery rate, message delivery time, and mean time to restore service in the event of service outage.

2.9.5 Problem reports

It is important to have a database of reported problems and their solutions. Problems that at first appear to be isolated may prove to be systemic. Recording and reporting problem information may provide clues for early identification of systemic problems. The problem reports should include information about the number of problems reported, the number of problems solved, the most commonly reported problems, and system availability during the reporting period.

2.9.6 Change control reports

Changes to any production environment need to be carefully considered, planned, and tested before being implemented. Changes also need to be communicated to other operational groups and to the user community. Change control reports provide an audit trail of configuration changes that can be useful for problem solving.

2.9.7 Design guidelines for operational reports

Operational reports should be designed with the target audience in mind. People generally suffer from an overload of information. A user should be able to quickly determine whether the information contained within a report warrants careful review. The following guidelines will help make reports more useful:

Reports do not need to be published on paper. In fact, some types of reports definitely should be delivered using other methods. The distribution method depends primarily on the purpose of the report and the target audience. Operational reports that are to be formally reviewed in group meetings generally should be distributed as hardcopy reports. Reports for managers and executives should be brief and generally delivered as an e-mail message. The corporate intranet is a good place to publish reports, such as service level compliance reports that are designed for users and groups outside of the operations team.


 < Day Day Up > 

Категории