Microsoft SQL Server 2000 High Availability

Step 1: Breaking Down the Requirements

The bullet points below indicate how you should approach a set of requirements. Do not try to define technology solutions at this stage. As database administrators (DBAs) or system engineers , which many of you reading this book probably are, your natural tendency is to lead with technology. Your goal, however, first and foremost, is to understand the requirements before attaching any technology. Any risks that are drawn up at this stage do not necessarily need answers; they will be answered in time as you move through the process.

Here is what you know so far:

Here are some questions you must answer:

At the same time, draw up a list of initial risks:

Step 2: Considering Technologies

With enough basic information in hand, you can take the known factors, questions, and risks and turn them into technology decision points. This must be done before you look at purchasing hardware. One of the biggest mistakes that many people or companies make, even without capacity planning information, is to purchase hardware based on a recent system put into production or some basic corporate standard. On paper, whatever the chief purchasing officer buys might seem as if it has the capacity you need, but does it really?

Tackling the database synchronization issue, you have a few choices available to you:

From the main OLTP database standpoint, how frequently is data changing? That is, are there many inserts , updates, and deletes occurring or is your data changing by only a small percentage? This impacts your maintenance plans and whether you implement log shipping or replication. How will you protect this database? Failover clustering seems like a logical choice because of automatic failover, but it might be overkill depending on the rest of the solution, or you might not be able to afford it. Remember that cost always lurks in the shadows.

As a first stab, you decide to use failover clustering for protecting the main OLTP database and the distributor ”they will be housed in one instance, transactional replication to populate the reporting server, and log shipping for protecting the entire solution in the event of a whole site failure. Using log shipping means that you not only have another data center but also that you are taking the risk that the log shipped database will serve as both the reporting database and the OLTP database. That might impact performance and would need to be tested .

You also decide to implement this entire solution on Microsoft Windows Server 2003 because it will provide the longest supportability and you know the system will stay in production for at least two years .

Step 3: Designing the Architecture

There are a few phases to designing your architecture: designing the hardware and then dealing with people and processes.

Note

Because people and processes are extremely important in your design, they are covered extensively and in much detail in other chapters. They are also presented at a high level in this chapter, and all considerations are reviewed.

Designing the Hardware

First, examine your disk requirements. On the primary OLTP system, you decide that it will serve as both the Publisher and the Distributor. You need to plan for the OLTP database, a distribution database, msdb usage, transaction logs, MS DTC, and a quorum disk, as shown in Table 11-1. You are looking at more than 700 GB of disk space for each copy of the data.

Table 11-1: Planning the Disk Subsystem

Requirement

Disk Space Planned

Separate LUN

Publisher/main database ”data portion

600 GB

Yes

Publisher/main database ”log portion

200 GB

Yes

Distributor

50 GB

Yes

MS DTC

1 GB

Yes

Quorum

1 GB

Yes

Notice that extra space was allocated to all databases. This allows for some extra room should the estimates provided actually turn out to be lower than what is needed or for additional growth.

Now it is time to address the problem of how to deal with the reporting server and warm standby. Should you use the warm standby as a replication primary? No, because you have clustering in place. It will be used purely for high availability, so you need 800 GB of disk space to match the primary. On the replicated server, you need the 800 GB as well. It is beginning to look as if you will incur a large cost by purchasing enough disk space to host the three copies of the data.

Next , examine your memory requirements. Because you can expand memory as time goes on (although that will incur some downtime), you decide to reduce costs and start with 4 GB of memory and then see how the application behaves before you expand to 8 GB. The /3GB boot.ini switch (described in Chapter 14, Administrative Tasks to Increase Availability ) will be used, allowing SQL Server to use up to 3 GB of total memory, leaving 1 GB for the operating system. The same is true for processor power ”you will start with four processors, but you will buy the fastest available to give you the longest system life. You should also specify that each system be the same brand and, if possible, the same model for ease of management and maintenance.

People and Processes

At this point, start thinking about how you will monitor, maintain, and administer these systems in a production environment. Choosing the technologies involved will dictate how you staff the project from a development and operational standpoint as well as what expertise is needed. If you do not have the expertise in-house, you will need to hire or train people. Along the same lines, is the architecture decided on actually viable for your environment? Will lack of expertise impede its adoption? Is it too complex? Will it meet all growth, performance, availability, reliability, and scalability goals? How will that affect SLAs? Do SLAs need to be rewritten? What backup schemes and other maintenance will need to be revised? Is there monitoring in place? If so, how can it be leveraged? If not, what is needed?

Step 4: Choosing Hardware and Costs

Although it does not matter in what order you configure the systems, this example starts with the cluster. Disks will more than likely represent your greatest cost. Do you go with larger, slower disks or smaller, faster disks? Start by looking at the number of disks needed for an optimal deployment.

Note

The next section goes into depth on cost and disk configuration. Although the amount of detail might be painful, it is necessary.

Server Cluster

Now you must configure your cluster based on the numbers you determined. Because there is only one main cluster system on your pick list, it makes your decision a bit easier.

Reporting Server and Log Shipped Secondary

Because you use racks in your data center, only stand-alone system option number two is valid. That translates into the following:

This calculates to a base cost of $61,700. Add an additional $250 for the log shipped secondary, as you want to use a private network to send the transaction logs, bringing the cost to $61,950 for that system. Those two elements added to the cluster now cost $123,900 for a new total of $406,100. Next comes the disk configuration for the two systems, which can be found on the Rack System “ Non Cluster tab of the Excel spreadsheet. Remember that you will have separate storage area networks (SANs) for each system, which means that you need separate controllers, hubs, and so on. You could use the same SAN, but that would be an availability risk. The cost of the hub and switch was already factored into the price of the systems. Again, the decision is made to use 72-GB disks with mirroring and striping on the data and RAID 5 for the system databases. The cost per system is $93,600, for a total of $187,200. You are now at $593,300, nearly $100,000 over your budget. Because you are over budget, there is no way you can even consider buying test systems without rethinking things because this solution is not viable.

Rethinking the Strategy

Because the reporting server is in the same data center as the cluster, you can utilize the same SAN ( assuming it is on the multicluster list). That would take the server price down to $50,000 for that system (including 4 GB of memory and a fibre adapter). That would put the cluster node costs at $164,400 “$50,000 for the replication server and $61,950 for the log shipped server. Combining the disk space for the SAN choosing the same option (which can be found on the One SAN “ Cluster and Repl tab of the Excel spreadsheet) costs $224,400, and the disk for the log shipped server is $93,600. All of this totals $594,350, which means you still have a cost problem. You now decide to switch to an all RAID 5 “based disk solution on the cluster/replication SAN, using 36-GB 10,000 rpm disks, which costs $168,200. That will reduce the total to $538,150, which is better but still over budget. You decide to go with a RAID 5 data solution for log shipping, which brings the SAN cost down to $61,200, reducing the overall cost to $505,750. You are exposed here because you are filling all 36 slots with used disks, leaving no room for hot spares .

Although you are still over budget, you go to management to explain the trade-offs you had to make and to get approval for the additional $5,750, which, over the life of the system, works out to be just under $240 per month for the two years. Although you have what seems to be a winning design, management questions the need for the log shipping server and says that if they approve any additional money to go over budget, it should be used for a testing system. This puts your availability in a disaster recovery scenario in jeopardy.

You need to factor in the disk enclosure space. Because you need to use the 36-GB 10,000 rpm disks, that is $11,400 for the disks. Because you are using 82 slots and each enclosure has 12 slots, you must add two more disk enclosures for a total of $34,000. That brings the system cost back up to $539,750. If the log shipped system ever becomes the primary, you will need to do backups on that as well. You will need to use 36-GB disks, which will cost $10,925 with RAID 5. Because you have filled up the slots on the three disk enclosures, you will need another two, which adds up to $26,400. Adding this $37,325 to the total, you are now at $577,075.

At this point, you have the following problems:

Unfortunately, there is no good solution without significantly increasing budget. If you eliminate the log shipped server, which dramatically puts you in harm s way in a disaster recovery scenario, you can subtract $160,475, bringing you under budget for the system as it is configured. However, you might lose $160,475 or more in revenue and leads if the system goes down. Will saving money on test systems further increase your risk for potential problems in production?

At this point, you need to make some hard decisions. What is most important to the core solution? Obviously the main OLTP system is required. Up for debate are replication and log shipping as well as the reporting functionality as a whole. If you eliminate the replicated reporting server, go back to using mirrors and stripes for the main production system and allow reporting to take place on it. This would result in $164,400 for the nodes and using 36.4-GB 15,000 rpm disks and a form of mirroring and striping for the data. You decide to use faster, smaller disks to get a performance benefit because you will have more users accessing the system. You no longer need a distribution database, so you need the items shown in Table 11-2.

Table 11-2: New Server Configuration

Function

# of Disks

Cost

Quorum RAID 1

2

$2,000

MS DTC RAID 1

2

$2,000

System databases RAID 5

3

$3,000

OLTP data

36

$36,000

OLTP log

14

$14000

Backups

40

$40,000

Total

97

$97,000

Taking into account the nine disk enclosures at $17,000 per unit, that is a cost of $153,000 for a total disk subsystem price of $250,000. The cluster will now cost $414,400. However, because you are adding load to the system, you add another 4 GB ($8,500) of memory per node, bringing the total cost to $431,400. This solution should satisfy your main concerns about the reporting and main OLTP capabilities. The one question here is your use of tempdb. If it is heavily used, you will need more or separate disks, but you now have the drive space for it, not to mention the room for hot spares, so you have also increased your expandability and availability.

You now have $68,600 left in your budget. This is not enough money for either a test system or a log shipped secondary for disaster recovery. At this point, you need to look at the following options:

The reality is that the last point ”documenting your risks ”is most likely to be the case, and the remaining $68,600 will go into other aspects such as cost overruns in development. The money might not be spent, but multiple people weighing the risk or reward of doing so will make that decision.

Категории