Upgrading and Repairing Servers

2017-07-07 02:10:07

Capacity planning, also called right sizing, is a process by which you develop a working model or hypothesis for the amount of loading that is placed on your server and the power of the server necessary to balance that load. When you successfully determine the right size for your server(s), you will have achieved the following five goals, in order of their importance:

1.	Adequate service levels Your server will be able to service the clients for the service it was intended to provide, not only for average loads but for peak loads as well.
2.	Reasonable excess overhead You don't want to buy too much server and have much of your investment lay idle.
3.	Appropriate fault tolerance The server should be operational enough of the time to be satisfactory to your users. Additional redundancies cost additional money, so right sizing your server requires that you take this factor into consideration.
4.	An upgrade path If your service grows over time, you may need to increase your server's capacity. This might mean that you need additional processors or memory, more network connections, and so forth.

5.	An appropriate life cycle An appropriate life cycle is one that makes your investment in the server reasonable. A server that was deployed for Windows 2000 (for example) and then upgraded to Windows 2003 has a longer potential life cycle than one stuck at Windows 2000. The average useful life of a first-line server is somewhat longer than that of a PC because server motherboards have more upgrade options. Most PCs are written off after about three years of service in corporations. Servers typically have four to five years of life.

Knowable, Unknowable, Known, and Unknown

As you might rightly suppose, it is hard to completely balance all the factors involved in capacity planning. There are known unknowns, such as the following:

How will your business activity change in the future?

Will you need to support additional users?

Will future operating systems or application upgrades make additional demands on your server?

There are also unknown unknowns, such as these:

Will your company merge, and will your server be asked to support an entirely new operating system or application?

Will your design specifications be made redundant (so to speak) by a required change in availability?

Will the budget change in a way that adversely affects your ability to upgrade or manage your server?

Let's face it: Capacity planning is a very difficult task because no one can predict the future, and technology advances are uncertain.

Let's start on some solid ground, what we defensively call the knowable knowns. There are two basic approaches to capacity planning, and they hinge on whether you have or can get established historical data for your usage pattern. If you can't, you must rely on the experience of others in similar circumstances who are using a server like the one you intend to deploy in a situation that is similar or at least can be extrapolated to the situation you are in. When you can establish historical data, you are in more control over the design you choose and the result you can achieve. Let's look at these two situations in a little more detail.

The "art" of capacity planning is one that can yield to statistical analysis, although many people find that it is easier to determine whether a certain type of server will not be able to maintain a load than whether that server can support that load. Consider a situation in which you have a server with a disk system capable of 250 IOPS (inputs/outputs per second), but you need to attain 500 IOPS. You know that your system won't give you the performance you want, but what you don't and can't know is whether if you double the number of disks you can achieve the rate you want. At some point in the system's design, you are going to max out your I/O bus, and you can't know if that point will be reached somewhere between 250 and 500 IOPS or beyond.

Here is one fact that you can absolutely take to the bank, no matter how powerful your server: Your users and staff will find a way to consume all of its resources at some point. This is a variant of Parkinson's Law (www.heretical.com/miscella/parkinsl.html): "Work expands so as to fill the time available for its completion."

Sizing Tools

When you are attempting to configure a server for a particular purpose, you can assume that all operating systems and applications make different demands on your server. Because software architectures vary widely, you need to characterize the capability of the operating system and the application to determine what it is capable of supporting. A hypothetical database application might be multithreaded and able to distribute a load among a number of processors; this would argue strongly for one or a few large SMP multiprocessor boxes. With similar large databases in existence, you find that I/O isn't often the problem; rather, raw computing power is. The system crunches a lot of data from a short question to return to the client an answer that doesn't have much data in it. Applications that require persistent connections, channels, or sockets are good candidates for scale-up server consolidations.

Upon closer inspection, you might find that published data by the vendor shows that a particular operating system/database application scales performance linearly from one to four processors but starts to loose multithreaded efficiency as the number of processors goes above four. By the time you get to eight processors, those eight processors run at 60% efficiency. This is in fact a very common scenario, and it comes up because the crossbar architecture that couples two quad processor sets together to make an eight-way system isn't 100% efficient and doesn't scale perfectly. So with that information, you have a first cut at right sizing your server, and you see that a quad server or an eight-way is the limit beyond which you do not want to go.

Consider a webserver deployment, the traditional "scale-out" instead of scale-up application where very little processing is done at the server but where I/O requirements are extreme. Any application that doesn't require or can't maintain a persistent connection isn't going to benefit the most from a faster server; it will benefit more from having more network connections and more available servers to connect to. That's why webservers tend to be deployed in server farms.

With a webserver, much of the processing gets done on the client side. When you examine the metrics of the particular webserver/operating system combination that you intend to use, you may find that the application can't make good use of more than two processors (this is another common scenario). Therefore, this system architecture begs for a solution in which many more computers are used and the amount of I/O that can flow through the network interface is the key bottleneck.

Note

IBM has a redbook called "IBM e-server pSeries Sizing and Capacity Planning: A Practical Guide" that you can download from their site, at www.redbooks.ibm.com/abstracts/SG247071.html?Open. This book contains a lot of conceptual material, including benchmarks, guidelines for application-specific sizing, a listing and description of available sizing tools, and IBM's Balanced System Guidelines. Although the redbook is aimed at selecting a pSeries server, the principles of sizing apply to other servers as well.

In order to decide how to size your server without being able to characterize the potential system loading, you need to look at similar systems that are deployed in the field. If you are lucky, you may know other places running servers and applications similar to the one you are running, and you can piggyback on their experience. Chances are, though, that you don't or that the other companies aren't willing to share them with you, so the next logical place to start to develop your capacity plan is with the server vendor selling you the system or with the application vendor selling you the software. IBM, for example, has a tool for this called the eConfig configuration that is available only to IBM and IBM Business Partners.

If you are purchasing a server from a large OEM such as Hewlett-Packard, IBM, or Dell, you might find that the company has developed online sizing tools that can aid your selection of a particular server from that manufacturer. For example, here are some sizing tools that you can use for different applications:

Dell sizing tools If you visit www1.us.dell.com/content/topics/global.aspx/alliances/en/sizing?c=us&cs=19&l=en&s=dhs, you can download a number of applications, including sizing tools for Microsoft SQL Server 2000, Microsoft Exchange Server 2000 and 2003, Microsoft Windows 2000 Active Directory, People Software JD Edwards, SAP, and Dell's Rack Adviser.

Hewlett-Packard ActiveAnswers tools The site http://h18001.www1.hp.com/partners/microsoft/utilities/storagesystem.html provides a number of sizers, including those for Apache Web Server for Linux, Backup Solution, BroadVision One-to-One Enterprise 6.0 for Windows 2000, Citrix MetaFrame XP and Windows 2000 Terminal Services, Commerce One Enterprise Buyer Desktop Edition 2.0 for Windows NT or 2000, Microsoft Commerce Server for Windows 2000, Microsoft BizTalk Server 2000 on ProLiant DL360, Microsoft BizTalk Server 2000 on ProLiant DL320, Microsoft Internet Information Server for Windows NT, Microsoft Internet Information Services for Windows 2000, and Microsoft Solution for Internet Business.

Although these systems identify only particular models from Dell and Hewlett-Packard, you can be relatively certain that if a dual-processor server handles the load that the sizer suggests, someone else's dual-processor system or even your own home-built one will probably be in the right ballpark.

Capacity Planning Principles

It is possible to be a lot more precise when you have historical data guiding your selection of a server. In a real case scenario, you can determine the parameters necessary to get optimum performance by examining logs from the previous generation of server(s). With a running system, you can analyze how resources are utilized and where any bottlenecks might be, and you can determine the nature of the I/O for which you need to tune your system. Figure 20.2 shows the common steps in a capacity planning project.

Figure 20.2. A right sizing flowchart is a very useful tool when you are involved in a server resizing project.

Let's consider an example of how a capacity planning project might proceed.

The principles involved in capacity planning based on historical data are as follows:

Understand your application You need to know your application's characteristics, including how it uses network bandwidth, RAM, disk sector size for I/O, and other important factors. If you don't know this information and can't find it in your application's documentation, you can probably get the information from your vendor.

Tune your drive If you can set up your disk drive's characteristics so that they match your application's I/O characteristics, you can not only lower the amount of disk space you use, but you can dramatically improve your performance.

Abstract your storage from your servers If you use a consolidated storage server, such as an intelligent array, you can dramatically lower your costs and improve your fault tolerance.

Obtain historical data to identify your maximum system load You need to know not only what level high system loads achieve, but the days, times, and reasons this loading is high.

Balance your system Any one component of a system can be a bottleneck. Try not to undersize the performance of one subsystem in order to emphasize another. IBM has an entire program and related set of calculations called Balanced System Guidelines that is a server sizing tool, but this tool is available only internally at IBM.

Determine your ROI Any server deployed should have a return on investment calculation done in order to justify its deployment.

An Example of Historical Data

At Company XYZ the decision is made to consolidate a large number of Microsoft Exchange 2000 servers at several sites into a smaller number of more powerful systems at a single sitea scale-up consolidation project. The rationale for this approach is that fewer servers offer fewer points of failure and lower costs of management. Exchange is an application which requires that four persistent connections (sockets or channels) be made to clients, which makes it a good candidate for a scale-up project.

The project manager is tasked with right sizing the system(s) to be purchased and developing the parameters under which the system is operational. The company had determined that instant messaging (IM) reduced its total cost of ownership (TCO) and wanted to add additional server resources to its Exchange operation. There was also the concern that as messages attached more and more rich content, their storage needed to be consolidated in order to handle the additional requirements.

The first step in the analysis was to collect data on the volume of email and the types of email in use by the company. It turns out that logging is turned on and a set of reports are generated that summarize email traffic in 48 increments each day (every 30 minutes). From these data the following can be determined:

The volume of email You can determine the volume of email, per each half-hour, both in terms of number of messages and the amount of transferred data that the system processes every time period.

The average volume You can determine the average volume that each individual, group, or site is using on the system.

When peak periods of email occur The collection of historical data analyzed the message stream and determined that maximum loads were attained on Monday mornings, 9:009:30 a.m. EST, when the largest group of company employees checked their mail from over the weekend. Another period of high load was Friday afternoons, 3:304:00 p.m. EST.

Average message size The average email message was 50KB, but the mean email message was actually much lower. Approximately 70% of all emails were 5KB or less, with 5% being emails with attachments.

Average size of messages written to disk Most messages written to disk were between 3KB and 4KB in size.

Messaging volume Messaging volume was increasing approximately 50% per year.

Based on these characteristics, the project set the following goals:

Reduce the number of servers by a factor of four.

Run at 99.2% uptime or greater.

Eliminate all Exchange replication traffic on the corporate WAN.

Make all Exchange data bulletproof by moving them off of SCSI attached storage and onto consolidated intelligent storage arrays.

Create a backup and recovery scheme that reduces both backup time and data recovery to a period of no more than a two-hour window.

Establish a design that can be replicated easily to scale the original deployment.

Exchange 2000's database engine is tuned to read and write in 8KB blocks, or extents. (That's not always the case, however: Other applications, such as SQL Server, are designed to use 16KB blocks.) The idea for disk tuning is that you try to match the READ/WRITE characteristics of the server to the application. For Exchange, the two most important characteristicsthe ones that the performance logs and literature recognize as bottlenecksare the READ to WRITE ratio and the I/O loading, known as the request rate.

The actual ratios as read by utilities such as IOMeter or IOBench were 3 to 1 READs to WRITEs (on average). From the Windows Performance Monitor and other utilities, you can measure IOPS, which is a measure of the storage system loading. The two relevant disk counters were Pending I/O Disk Requests and Average Disk Queue Length. These factors define the average I/O per user, approximately 1.5 IOPS per user. Memory counters also defined server RAM as a potential bottleneck.

System Selection

With all the aforementioned design parameters and application characteristics, it was possible for the people who are specifying the server consolidation project to make various system selections. The original dual-processor servers were replaced with eight-way servers in a ratio of 4 to 1. This size server was expected to support 3,500 users with the user characteristics described in the previous section. The Exchange messaging system was installed in a single domain on the same data center network.

A large storage array system was chosen, and each Exchange data store was placed onto a large single-volume server with a 4KB cluster size specified. This cluster size was a little larger than the mean message, allowing most clusters to be completely filled by a message of mean size without wasting much additional disk space. To attain the desired I/O, the number of 15,000rpm dual-homed Fibre Channel disks required was calculated based on the storage vendor's performance data; also, a disk cache size was established so that the desired throughput and an additional margin for growth could be achieved.

Note

In selecting system components, the general rule of thumb is to determine which component is going to cost the most money and/or which component is going to be the most difficult one to fix if it isn't correctly sized for the project. That component should then be selected in order to best meet the needs of the project. For the Exchange project described in the Company XYZ example, that evaluation could have led the project manager to pay particular attention to either the servers (the most difficult to correct) or to the large disk arrays (the most expensive). Usually the most expensive component is the one that is also the most difficult to correct.

Because WRITEs are slow and represent a large portion of the disk I/O, Company XYZ chose a RAID 0+1 array, which provides the speed of striping with the first-line redundancy of mirroring. It also decided to back up each data store with a BCV (business continuance volume) for failover in case of primary disk storage array failure. Backups were collected as snapshots, with a full backup made of a BCV every week and stored to tape. It was also established that snapshots average around two hours to take and that switching over to a BCV upon system failure could be effected within two hours, with most of the time taken up by bringing the BCV up-to-date with transaction logs.

In this particular example, it was found that the system had a life cycle of around 3.5 years. There were also some surprises. It was found that virus scanning software had an unexpected impact on performance, that adjusting the disk quota size for a user affected what is called the single instance ratio (a single file is stored on disk with many pointers to it from users' mailboxes), and that the system could be successfully scaled.

Although this project is based on an Exchange server consolidation project, which means the data are probably not useful to your project, the principles it exposes are useful to anyone trying to deploy a right-sized server. That is, by understanding your application's characteristics, measuring performance bottlenecks and key characteristics, and extrapolating correctly, using your particular loading, it is possible to make good equipment selections. As with all large projects, if your deployment installs multiple systems, it is prudent to create a test bed to determine whether your assumptions are borne out by fact before proceeding to add additional systems.

Note

At this point, any deployment project should have produced a project specification that everyone involved has signed off on. The last thing you want to have happen is to deploy a server that is sized to service 3,500 users and have a high-level manager say that he or she said "5,500 users" or, worse yet, "350 users."

Having management sign off on a document ameliorates these kinds of problems, and it lays out all the assumptions you've made when you can't establish a completely factual model. Therefore, any good specification should document any caveats, establish a risk analysis, and make promises about performance in a conservative way.

A project specification document doesn't need to be Department of Defense fighter-jet long; it can be a spreadsheet and an email message, but it's something you need to create whenever you are spending someone else's money.

Knowable, Unknowable, Known, and Unknown

Sizing Tools

Capacity Planning Principles

Figure 20.2. A right sizing flowchart is a very useful tool when you are involved in a server resizing project.

An Example of Historical Data

System Selection

Категории