Microsoft SQL Server 2000 High Availability

Whether you are just upgrading from another version of SQL Server, consolidating your database environment, or migrating to SQL Server from another platform, you need to be concerned with some basics before you even begin the process. This section focuses primarily on the technical aspects, but it is crucial to understand the business reasons or drivers for undertaking such a project to ensure that the technical meets the business, and vice versa. Follow the guidelines in Chapter 1 and Chapter 2 for setting up a steering committee and so on. Any planning fits squarely within Microsoft Solutions Framework (MSF) and Microsoft Operations Framework (MOF) models.

Your business sponsors should be able to answer the following questions. The information will greatly help you in your technical planning.

Phase 1: Envisioning

While the business side of the house is figuring out the nontechnical drivers for your effort, the administrators (DBAs and non-DBAs, too) must start to gather information for the plan on the technical side. This involves a great deal of information that you should have been gathering all along to establish baselines and make judgments . Ask the following sample questions and document the answers, as each one will influence your planning. Add questions if they are relevant to your environment.

Profile Target Systems

The most important component of any upgrade, consolidation, or migration is having adequate technical information about each system. Without this, you cannot proceed successfully with any of your efforts. If you do not know if you can combine a workload, it is impossible to make any decisions.

Each system that is being considered must be profiled, and not for just one day or, sometimes, even just one week. The profile is the result of monitoring over time to gather trends. These trends might be daily or weekly jobs, as well as daily, weekly, and even monthly trends in workload (for example, end-of- month sales reports ). Profiling includes much more than just the system name: every important performance aspect of the server must be captured. When figuring out if workloads and systems can be combined, you need to determine if incompatibilities will result from, say, a configuration setting or an application requirement.

Besides capturing general items such as operating system version, service pack levels, IP addresses, and disk configurations through the worksheets, the specific settings for each application also need to be captured. SQL Server provides the following methods to assist in the documentation of your systems:

You should also document all of the objects associated with each database. This includes stored procedures, jobs, logins, and system-level logins as well as their corresponding database users, maintenance plans, and so on, even if they reside outside of the database. You will use this information later to determine conflicts, such as duplicate stored procedure names and logins or differences in naming standardizations. If you want, run the same DBCC commands on each system. Document anything and everything, even if you think it might not be relevant. Include linked servers, modified system databases, and so on.

More Info

For more information on gathering information about your systems, consult Chapter 4, Disk Techniques for High Availability, Chapter 12, and Chapter 15, Monitoring for High Availability. The run book described in Chapter 12 is a great resource for information. Chapter 4 and Chapter 15 cover monitoring and capacity management from a system performance perspective.

Gathering System Performance Metrics

Gathering performance metrics from your server is more than a one-time event. To truly understand what is going on with your system, performance metrics must be done over a period of time and then analyzed . At a minimum, you should gather statistics about your systems for one business week. During that time, record numbers when there is little to no load, medium load, and heavy load. Gathering numbers only at times that show one extreme or the other does not paint an accurate picture of system use.

Other SQL Server Information to Gather

This section describes the most common data to collate , which should help you plan for the issues that are most commonly encountered when consolidating. Refer to the section on technical considerations under the planning section of this chapter for more information and details on most of these points.

Consolidating System Databases

Be sure to take care of the following:

Collation and Sort Order

Check out the following:

Security (Logins, Security, and Permissions)

Be sure you have information about the following items:

Serverwide Configuration Settings

Here are some factors to check out:

Determining the Existing Environment (Workstations, Servers, Applications, Users, and so on)

Some more information you must gather follows :

Health Checks

Take care of these, too:

Multiple Versions

Take care of this:

Excessive Use of Tempdb

Take the following actions:

Database Options

Answer these questions, too:

Phase 2: Technical Considerations for Planning

Once you gather the necessary information, you can then perform the analysis to put the proper plans in place.

Single Instance Versus Multiple Instances

Before you can consider any other technical aspect, you must decide if you will be using one instance with many databases or multiple instances, each with its own set of databases. Instances were first introduced in SQL Server 2000. One instance of SQL Server 2000 equates to one installation of SQL Server on your machine. Prior to SQL Server 2000, you could only have one installation of SQL Server per physical server. There are two types of instances: default and named . A default instance is analogous to the current functionality of previous versions of SQL Server. A named instance of SQL Server is exactly like it sounds: you name your SQL installation with a unique name. SQL Server 2000 supports up to 16 instances on one physical server or in a single Windows server cluster, regardless of the number of physical nodes comprising the cluster. In other words, you can have 16 named instances or one default and 15 named instances. Consult SQL Server Books Online for any changes in the number of instances supported.

Note

You can only have one default instance, so older applications that cannot handle named instances might have problems. Please test if you are implementing named instances.

Each instance gets its own set of core binaries, but they share underlying components such as MDAC, Microsoft Distributed Transaction Coordinator (MS DTC), and one Microsoft Search service. If there is more than one instance of SQL Server, each has the ability to stay at a certain service pack level, with the exception of any shared components. This is a benefit for a consolidation effort, because you can have instances at the specific levels at which you need them; however, for example, if you need to install a SQL Server 2000 service pack and it for some reason requires a physical reboot, you will affect all other services running on that server. In general, once you have upgraded one instance to that service pack level you will not have to reboot to upgrade the other instances because these shared files have already been replaced . Regardless, your availability and SLAs must be taken into account when designing consolidated SQL Servers.

From a performance perspective, each instance will get its own memory, including cache for stored procedures. Is it better for your endeavor to put hundreds of databases under one instance or to spread them out over more than one? The Memory and Processor section later in this chapter delves into this topic in more depth.

In most cases, a single instance means less administrative overhead: with only one instance to manage, the complexity of what goes on in SQL Server can be easily managed and localized. It also ensures that automatic settings, such as using dynamic memory, are easier to consider if there are no other processes to contend with. If you have limited hardware, this might be your best option.

Multiple instances allow for each instance to cater to a different set of rules, SLAs, and so on. The biggest advantage is that you can stop the services for each instance individually. Multiple instances also mean more complexity when it comes to system design and management on all levels, but the rewards of dedicated memory and separate processes (with the exception of shared components) might be worth the trade-off.

When thinking about single versus multiple instances ”as stated in the introductory paragraph of this section ”I/O, processor, and memory are the key considerations.

Application Compatibility with Instances

Because the architecture of SQL Server 2000 is different with instances, in particular named instances, you should test older applications to ensure that they still work properly. Named instances technically require MDAC 2.6 to connect properly. A default instance should be fine. In the event there is a problem if named instances are used, there are a few potential ways to solve any issues if the application cannot connect to a named instance:

Disk Subsystem

The first stop for any SQL Server system should be the architecture of its disk subsystem. If you are looking to combine workloads or to upgrade, it is imperative to know not only how much disk space you will need but, if you are consolidating, what workloads will work well together from an I/O perspective. That is a serious challenge faced by customers both big and small. Read-only reporting databases have different I/O requirements than heavily used OLTP systems with many writes and a smaller number of reads. Do not take this configuration lightly; you might initially save cost by consolidating, but if you have underperforming applications, the consolidated environment will be a failure.

More Info

Disks were covered in great depth in Chapter 4, Disk Techniques for High Availability. Refer to that chapter for any further discussions relating to disk issues.

Memory and Processor

There are a few important considerations when determining how many processors the new servers will contain. As with disk and memory, prior benchmarking and application and system profiling data will need to exist to help you make a more accurate decision. Here are some of the main issues:

Networking

You might now encounter more network traffic flowing to one SQL Server. As with any other planning consideration, do you currently know how many concurrent users are usually connected to the SQL Server at various times, especially under maximum load? From a memory perspective, each connection consumes 12 KB added to the network packet size multiplied by three:

Connection Size = ( 12 + (3 packet size) )

Therefore, in addition to the memory required for the normal SQL Server usage for things like procedure cache and queries, you need to take into account network connections. This factors into whether you decide to consolidate on one instance or multiple instances.

Once you know how much memory user connections take up for each database, you need to worry about network bandwidth. Can what is coming in and going out be handled by the proposed networking on the server? For example, if you have one 10-Mb card but your overall required throughput is 20 MB, do you add another network card and IP address (both of which add a bit of system overhead) or get a higher bandwidth network card and ensure that the network itself can handle the load?

Other networking issues include the following:

Security and Logins

Security is an obvious concern. Are there different standards for all of the applications, databases, and servers targeted? If there are, will they be in conflict with one another if they are combined? Or, for example, is something like IPSec required, but you are consolidating to a cluster where IPSec is not designed for failover, meaning the application will need to be changed or possibly not consolidated but put instead on its own instance or server?

Logins and how applications and users actually access SQL Server are the biggest areas of concern for security in a consolidation effort. User migration could consume a great deal of planning time. The Networking section earlier in this chapter already mentioned domain connectivity and domain accounts, but here the focus is on the actual logins. Do you have servers that have a mix of both Windows Authentication and mixed mode? Will that introduce risk when workloads are combined?

Also, security has changed a bit if you are only familiar with SQL Server 6.5 and earlier. SQL Server 7.0 introduced the concept of roles, so you might want to rethink how security is done for a particular application. This might become scope creep, so do not consider or implement it if it will impede the consolidation effort.

Here are some issues to consider with logins:

High Availability

Each instance of SQL Server 2000 needs to be made highly available to meet your SLAs. Because this book is all about high availability, putting solutions together, planning, and so on, this section is pretty self-evident.

Replication

If replication is configured on any, or many, of the databases being considered, planning will certainly be impacted. Each database or publisher needs a distributor, and the distributor can live on the publisher, the subscribers, or be hosted on a separate system. Do you plan on having a 1:1 ratio of publishers to distributors? That is probably unrealistic in a consolidated environment, which means that you will need to do some up-front capacity planning to ensure that the distributor can handle multiple publishers. Disk I/O will be impacted greatly, followed by processor and memory, especially depending on the type of replication implemented. This also means that the distributor will have to exist in a completely separate instance or, most likely, on a different server to ensure good performance and high availability. You might consider a remote distributor as well. Remote distributors for various publishers can be consolidated under one SQL Server instance, but distribution databases per publisher should be distinct. One instance with a publisher and distributor might be acceptable in a nonconsolidated environment, but it will not be acceptable in a consolidated environment. You also have to consider the base snapshot directory and its location per publisher: will you share one per SQL Server instance or use another one?

Also consider the subscribers, as they will be impacted, too, if you move publishers and distributors. This is important if you have to disable replication during the implementation process.

Migrating Objects

One of the hardest tasks for any upgrade, consolidation, or migration effort will be migrating objects, such as alerts, SQL Server Agent jobs, and Data Transformation Services (DTS) packages. For alerts, jobs, and operators, you can easily script them; however, before applying them to the consolidated SQL Server, ensure that there are no naming conflicts and that when you migrate to the new server, the proper user is associated with SQL Server Agent jobs. For DTS packages, you have a few options to migrate the packages to other servers, but once you move them, you might need to make modifications to them. Do not assume that they will run as is. Also ensure that the package is compatible. As of SQL Server 7.0 Service Pack 2, Microsoft changed the format from SQL Server 7.0 RTM and Service Pack 1. In that case, you would have to completely re-create the package under SQL Server 2000. Remember to account for time in your testing phase for testing and possibly fixing these types of issues.

Tip

There is a utility called the Copy Database Wizard included that can help move databases and their corresponding users, jobs, and other elements. An interesting option with this utility is to select the Run Later option and allow it to create five separate DTS tasks to move these pieces. You can then run all five or any combination of them that you need to transfer your database information.

Administration

Prior to migration, think about the following considerations that impact the eventual administration of each consolidated SQL Server instance:

Chargeback

Chargeback is the process of assessing the cost to utilization of the hardware so business units or customers can be charged appropriately. If chargeback is a consideration, you have a few options. One is to use SQL Server. Third-party tools such as ARMTech from Aurema can assist you in your cost accounting for system resource usage.

System Databases

System databases are an important consideration, especially in a consolidation effort. The system databases contain data and structures that are relevant to the entire server installation. Unlike user databases that can exist independently of each other, system databases are closely associated with each other as well as to the user databases from the source server.

How will master, model, and msdb be configured? Because you are now combining multiple servers, each with its own copy of the databases, you cannot just restore multiple copies, as the last restore will overwrite the previous copy. Remember that msdb includes all of your jobs, alerts, and operators (as well as history for some features such as log shipping), so plan on a size of at least 45 MB, and then add the additional amount per instance as well as about 10 percent overage.

Besides these space requirements, you need to analyze master, model, tempdb, and msdb to detect duplicate database objects and document them. This will extend from logins, stored procedures, user-defined data types, and tasks through to the specifics of objects. For instance, an object of the same name will represent or perform actions particular to the source server, or an object of a different name will represent or perform actions that accept modifications at the server level. This becomes particularly difficult when modifications have been made to these databases outside of logins. Remember to include any and all startup stored procedures.

You must identify any elements that are not part of a standard SQL Server build or installation, whether they are modifications or new objects. Review each object for relevance in a consolidated environment. Review each object to determine if duplicity exists at a more granular level. Review each object for explicit references, such as path names, server names, and task names. Do not address tasks found in msdb, as those are handled separately and must be dealt with on their own. Only search for nondefault objects found in tempdb. Also consider in advance how you will respond to application needs for previous compatibility levels. In general, these tips can save you much trouble in having to update applications, but remember that being fully on the current version usually minimizes support difficulty and costs.

Collations and Sort Orders

You must also take into account the collations and sort orders between servers. It is more likely that databases of varying collations will coexist in a consolidated environment than in numerous stand-alone servers. You can encounter unexpected results after migrating to a consolidated environment, and you must realize that certain applications depend on the collation and sort order setting of the database. If they are not set properly, the applications will not work after the switch to the consolidated environment.

Temporary tables will be created in tempdb with collation settings of the tempdb if the collation syntax is not included. Passwords in a case-insensitive SQL Server are converted to uppercase before being stored or used. Passwords in a case-sensitive SQL Server are not converted to uppercase. Because of this difference, passwords originally encrypted on a case-sensitive server and later transferred to a case-insensitive server cannot be used unless all alphabetic characters in the password are uppercase.

Similarly, understand how you need NULL to behave in your queries. If you turn the database option ANSI_NULLS on, all comparisons to a null value evaluate to NULL, which equates to unknown. If this option is off, comparisons of non-Unicode data evaluate as true if both values are null (that is, NULL = NULL is true). This is set to Off by default, so you should see the latter behavior. Also look at any other ANSI-related setting that might affect your environment.

More Info

For more information about collations, see the Effect on Passwords of Changing Sort Orders topic in either the SQL Server 6.5 Books Online or 7.0 Books Online and Selecting Collations in SQL Server Books Online.

Other Technical Considerations

Here are other technical considerations to think about prior to migration:

Phase 3: Consolidation Planning ”The Server Design and the Test Process

Once the envisioning phase is done and guiding principles are agreed on and documented, work on planning the consolidation can begin. There are two main parts to planning: designing what the consolidated servers will look like and testing the implementation process.

Designing the New Production Servers

When you start to design the new environment, you need to take into account every aspect from the ground up: administration and operations (including monitoring), performance, backup and recovery, chargeback (if necessary), disaster recovery, high availability, security, and so forth. Do not assume that you can just use what you have in place for your current system, as the new environment might be different. Also, with more databases and SQL Server instances, you will have different rules and options that need to be reconciled. At this stage, if you identify new tools that need to be either built or acquired , document and plan for those, too.

Migration of Applications, Users, and Data

One of the important factors for the DBA is how the applications, users, and data will be migrated to the new consolidated environment. At this stage, any potential SQL Server 6.5 and SQL Server 7.0 migration issues should be completely isolated and identified, from Transact-SQL incompatibilities to system settings and everything in between. You might also need an intermediate staging server. Determine the order in which you will migrate the applications, users, and data. Resolve any conflicts in object names and location. Worry about the end users experience and how you will make the effort as transparent as possible for them. Also determine how you will notify your end users about the move. How will business processes be impacted?

Avoid scope creep. You are consolidating, not adding new functionality or enhancements to applications or data schemas. Problems could be identified during the migration planning that, as long as they will not break in the consolidated environment, should be left alone. Once you have consolidated, you might want to consider these problems, but trying to solve them at this stage only causes the consolidation to take much longer than necessary.

Finally, do not waste time adding applications to a consolidation until the application is reasonably stable. Stability is crucial; many of the troubleshooting tools that might be required to diagnose the flaky application (for example extra perfmon counters, debug builds, and so on) could adversely affect the performance of the entire system.

Test the Process

Prior to actually doing the planned upgrade or consolidation in a production environment, you must build and test all of the new procedures, tools, and the entire migration. The capacity and capability of the environment used for testing should be the same as, if not close to, the final production environment. However, you may be constrained by budget, resources, and other limitations. The importance of this stage is to isolate, identify, document, and fix any defects, errors, or problems that you encounter so a production effort will, one hopes, be flawless. Devising the proper test plans will ensure a successful production rollout. The plans should identify the following:

Building a test platform is not an easy task. In some cases, it does not need to be an identical system or set of systems from the standpoint of number of processors, amount of memory, or total storage accessible by the cluster nodes, but in all other ways the test platform and the production environment should be identical. The ability of the system or systems to be partitioned affects this as well. Partitioned systems can be set up so that a small partition has exactly the same BIOS, chipset, devices (include firmware and BIOS versions), driver versions, kernel components (for example, antivirus, storage management, security monitor, system management utilities, firewall, disk mirroring, and so on), and user mode applications. All versions of all components should match the production environment and configuration, otherwise the confidence gained from testing an update or upgrade is lowered . Systems that do not support partitioning must be mirrored separately, but it is still important to ensure that all the components are identical between the test system and the production system. The same logic applies to clusters.

The most critical factor affecting whether the update or upgrade will be successful (and not cause any unplanned downtime) is interactions between different versions of hardware and software components and not the total size of the system as measured by number of processors, amount of memory, total storage, number of network or storage adapters, and so on. Thus, a test system or partition need not be exactly the same capacity as the production system, but it should be representative. An example for the use of a partitionable system might be that the production environment is 12, 16, or even more processors, but to be representative, the test partition might only need to contain 4 CPUs. Similarly, the production system might include 32 GB, 64 GB, or even more RAM, but to be representative, the test partition might only need to contain 8 GB of RAM.

Continuing with the same logic, it might not be necessary to fully reproduce the storage capacity of the production system, but only to provide the same behavior and characteristics, scaled to match the smaller test configuration. Some storage arrays support multiple systems and provide internal partitioning. The test system could be attached to the same partitionable storage resource as the production system but scaled down, providing the same storage behavior and characteristics for the test system as the production system. Finally, if you are using a server cluster, it is important to model the test cluster to match the production cluster configuration as closely as possible. Doing this ensures that planned and unplanned failovers allow the application or entire solution to function correctly, guaranteeing the highest possible availability. You can accomplish this testing with just two cluster nodes even if you have a four-node cluster in production, but if you do not match capabilities as closely as possible, your testing may be somewhat invalid.

The reasons that versions are the key factor to test, rather than size or capacity, include the following:

The use of a scaled-down environment is usually possible, with the most glaring exception being when the production system is running close to its maximum capacity, as measured in processor utilization, memory utilization, paging activity, network bandwidth, storage queues, and so on. In this case, even small changes in the behavior of components might cause the production system to begin to function noticeably more slowly and not be responsive enough for production use. This is unavoidable and simply a function of how any system responds when the work demand exceeds its ability to complete that work. In a worst case, the system might effectively be unusable. Thus, customers should always leave headroom in the production system that can absorb an unexpected performance impact. This also gives the system the same headroom for spikes in demand. As a rule of thumb, many customers limit average utilizations of various system resources to between 65 percent and 70 percent. This guideline, however, may not apply to your systems at all. There is a difference between peak usage and overall average use of systems. Those same customers begin planning to expand or otherwise upgrade the system capacity when those levels are consistently exceeded.

Once the test systems are built and are identical with the production systems, baseline testing should occur to validate functionality and assess performance characteristics and how they scale in comparison to the production systems.

There is some expense involved for the test system or systems, adapters, fabrics , networks, storage, and possible clusters and other hardware components. This is also true for the various software components, such as the applications and utilities that run on the system. However, you need to measure these expenses against the cost of encountering unplanned downtime that results from a failure, or when the update or upgrade goes bad and results in hours ”or even more ”of downtime. For all of these components, both hardware and software, customers should ask their vendors about costs and licensing, vendor policy on product instances used strictly for testing purposes, and other concerns.

Finally, it might be necessary to have some clients and a small network set up so that loads can be applied to the system or components using synthetic or dummy loads supplied by vendors or built by the customer.

Important

Pay attention to time in your testing. Too often, testing occurs in the weeks before a migration when there is plenty of time. Your staff might kick off jobs and go home. The next morning, the jobs have completed successfully and everybody is happy. However, when you go to migrate production applications in the 8- hour window of downtime you have arranged with users and find out the migration takes 12 hours, you have a problem. Every test task should measure not only the migration function, but also the time it takes to do this function. You might find that your available downtime window limits how much of the task you can accomplish in a night or a weekend .

Determining Risk

It is important to determine the risks ”whether based on people, technology, or cost ”that will be measured against. The success of a plan depends on knowing where potential failures can occur so you can mitigate them and see if the risks are worth taking. The best thing you can do to determine these risks is to test the entire process. However, you should also discuss the risks with anyone who has migrated these applications in the past. They might save you a lot of trouble by helping you to avoid any problems they encountered.

Phase 4: Developing

The developing stage is when the plans are realized and implemented for the first time in a testing or staging environment. This requires hardware completely separate from the eventual production servers, as it is crucial to work out any potential problems before going live. Ideally, the servers used in this stage have the same capacity and horsepower as their production counterparts, but that might be unrealistic due to budgetary constraints. The bottom line is that the better your testing or staging environments mirror your production environment, the more successful you will be when the actual production migration occurs.

There are at least three steps during this phase: technology validation, proof of concept, and piloting.

Once you decide on and plan for the technology that will be used during this phase, it is time to see if the technology performs as expected or if it is not appropriate at all. This is one of your last chances to modify the consolidation effort s physical design. Once the technology has been proven, it is time to roll out a proof of concept to show that the final environment will work as expected. The proof of concept should mirror the production environment exactly, just on a smaller scale. Finally, choose one of the servers that will be consolidated and make a pilot of it. Even if the technology has been evaluated and the proof of concept was successful, you still need a proper pilot to prove that in a production capacity ”unlike the proof of concept, which is on a much smaller scale ”the consolidation will benefit the entire business. If the pilot is a failure or is scrapped for other reasons, this is the time to figure things out.

Follow the configuration plans and migration plans as documented. If problems arise, document them, and if there is a fix, document the fix and revise the plans accordingly . After migrating applications, data, and users to the new consolidated SQL Server, put it through its paces ”use it as it would be used on a daily basis ”and test the servers under load. It is crucial to ensure that the databases that are now consolidated under one SQL Server will work well together when all applications are experiencing high utilization. Otherwise, there will be performance issues in production.

If any issues arise due to coexistence after testing, document the issues, and possibly rethink the strategy for that particular application s databases. Mitigating risk at this stage is crucial. Remember to test not only the migration, but also any administration plans, scripts, procedures, tools and so on that will be used in the eventual production environment. Just testing the migration process is not enough.

Phase 5: Deploying and Stabilizing

There are two main steps in this phase: back-end deployment and application deployment.

Back-end deployment is when the backbone of the system (hardware, operating system, networking, SQL Server, and so on) has been completely configured and tested. Application deployment is just that ”configuring the databases and rolling out the applications in the new consolidated environment. At some point, you reach the go/no-go point, beyond which you cannot easily go back to the old environment. Stabilization continues through and beyond this stage.

Once you are confident that the proper hardware designs have been crafted and the migration plans are well tested, it is time to build and deploy the production consolidated SQL Servers. Even if your plans are well tested, do not perform the entire consolidation at once. Take a phased approach: Deploy one consolidated server, migrated server, or upgrade, thoroughly test it, compare it with the original environment, and then finally retire the old environment. Only at this point should you consider consolidating another SQL server, because if you do not completely verify that one effort went as expected and other problems are encountered during another migration, you could obscure and add to the problems. If it was unsuccessful for whatever reason, you can update your plans accordingly. Keep the old environment available until you have completed several or all of the upgrades, just in case.

Категории