Business Continuity Planning and Disaster Recovery Planning
Overview
The Business Continuity Planning (BCP) and Disaster Recovery Planning (DRP) domain is all about business. We’re not talking about infringements of security policy or unauthorized access; rather, this is about making contingency plans for a business-threatening emergency and continuing the business in the event of a disaster. While the other domains are concerned with preventing risks and protecting the infrastructure against attack, this domain assumes that the worst has happened.
The 21st century is shaping up to be the “disaster” century; it’s sure starting out that way. A lot has been said about 9/11; it was the largest implementation of Disaster Recovery Plans in American history. A great number of recovery stories sprang out of that event, and many companies had to improvise well past their plans. In the publishing world, for example, TheStreet.com and the daily newspaper American Banker ran from various journalists’ homes for several weeks afterward. The August 2003 East-Coast power blackout was proof that what looks good on paper may not work in the real world. The distributed power grid was supposed to isolate power faults and create a fault-tolerant system, whereas in actuality the grid cascaded the faults onto other utilities’ grids.
The effects of a disaster may not be immediately felt. For instance, in August 2001 a large office fire on Wall Street displaced many companies, many of whom were able to continue business after the immediate evacuation and relocation. However, a later study showed that 80percent of the businesses failed within 3 to 5 years after the event, because they could never fully recover their client base or credibility. Their clients were happy with alternative vendors; the event gave their competitors too strong of a foothold into their space.
The CISSP candidate should know the following:
- The basic difference between BCP and DRP
- The difference between natural and man-made disasters
- The four prime elements of BCP
- The reasons for and steps in conducting a Business Impact Assessment (BIA)
- The steps in creating a disaster recovery plan
- The five types of disaster recovery plan tests
- The various types of backup services
The BCP and DRP domains address the preservation of business in the face of major disruptions to normal operations. Business Continuity Planning and Disaster Recovery Planning involve the preparation, testing, and updating of the actions required to protect critical business processes from the effects of major system and network failures. The CISSP candidate must have an understanding of the preparation of specific actions required to preserve the business in the event of a major disruption to normal business operations.
The BCP process includes the following:
- Scope and plan initiation
- Business Impact Assessment (BIA)
- Business continuity plan development
DISASTER DEFINITION
The disaster, emergency management, and business continuity community consists of many different types of entities, such as governmental (federal, state, and local), nongovernmental (business and industry), and individuals. Each entity has its own focus and its own definition of a disaster. A very common definition of a disaster is “a suddenly occurring or unstoppable developing event that:
- Causes loss of life, suffering, loss of valuables, or damage to the environment
- Overwhelms local resources or efforts
- Has a long-term impact on social or natural life that is always negative in the beginning”
The DRP process includes the following:
- Disaster Recovery Planning (DRP) processes
- Testing the disaster recovery plan
- Disaster recovery procedures
Business Continuity Planning
Simply put, business continuity plans are created to prevent interruptions to normal business activity. They are designed to protect critical business processes from natural or man-made failures or disasters and the loss of capital resulting from the unavailability of normal business processes. Business continuity planning is a strategy to minimize the effect of disturbances and to allow for the resumption of business processes.
A disruptive event is any intentional or unintentional security violation that suspends normal operations. The aim of BCP is to minimize the effects of a disruptive event on a company. The primary purpose of business continuity plans is to reduce the risk of financial loss and enhance a company’s capability to recover promptly from a disruptive event. The business continuity plan should also help minimize the cost associated with the disruptive event and mitigate the risk associated with it.
Business continuity plans should look at all critical information-processing areas of the company, including but not limited to the following:
- LANs, WANs, and servers
- Telecommunications and data communication links
- Workstations and workspaces
- Applications, software, and data
- Media and records storage
- Staff duties and production processes
Life safety, or protecting the health and safety of everyone in the facility, is the first priority in an emergency or disaster. Although we talk about the preservation of capital, resumption of normal business-processing activities, and other business continuity issues, the main, overriding concern of all plans is to get the personnel out of harm’s way. Evacuation routes, assembly areas, and accounting for personnel (head counts and last known locations) are the most important elements of emergency procedures. If at any time there’s a conflict between preserving hardware or data and the threat of physical danger to personnel, the protection of the people always comes first. Personnel evacuation and safety must be the first element of a disaster response plan. Providing restoration and recovery and implementing alternative production methods come later.
ASSET LOSS
The loss of assets entails more than just the hard costs of replacing destroyed systems. Other examples of business assets that could be lost or damaged during a disaster are:
- Revenues lost during the incident
- Ongoing recovery costs
- Fines and penalties incurred by the event
- Competitive advantage, credibility, or good will damaged by the incident
Continuity Disruptive Events
The events that can affect business continuity and require disaster recovery are well documented in the Physical Security domain (Chapter 10). Here, we are concerned with those events, either natural or man-made, that are of such a substantial nature as to pose a threat to the continuing existence of the organization. All the plans and processes in this section are “after the fact”; that is, no preventative controls similar to the controls discussed in the Operations Security domain (Chapter 6) will be demonstrated here. Business continuity plans are designed to minimize the damage done by the event and facilitate rapid restoration of the organization to its full operational capability.
We can make a simple list of these events, categorized as to whether their origination was natural or human. Examples of natural events that can affect business continuity are as follows:
- Fires, explosions, or hazardous material spills of environmental toxins
- Earthquakes, storms, floods, and fires due to acts of nature
- Power outages or other utility failures
Examples of man-made events that can affect business continuity are:
- Bombings, sabotage, or other intentional attacks
- Strikes and job actions
- Employee or operator unavailability due to emergency evacuation or other issues (these could be either man-made or naturally caused)
- Communications infrastructure failures or testing-related outages (including a massive failure of configuration management controls)
The Four Prime Elements of BCP
There are four major elements of the BCP process:
- Scope and Plan Initiation. This phase marks the beginning of the BCP process. It entails creating the scope and the other elements needed to define the parameters of the plan.
- Business Impact Assessment. A BIA is a process used to help business units understand the impact of a disruptive event. This phase includes the execution of a vulnerability assessment.
- Business Continuity Plan Development. This term refers to using the information collected in the BIA to develop the actual business continuity plan. This process includes the areas of plan implementation, plan testing, and ongoing plan maintenance.
- Plan Approval and Implementation. This process involves getting the final senior management signoff, creating enterprisewide awareness of the plan, and implementing a maintenance procedure for updating the plan as needed.
Scope and Plan Initiation
The Scope and Plan Initiation phase is the first step toward creating a business continuity plan. This phase marks the beginning of the BCP process. It entails creating the scope for the plan and the other elements needed to define the parameters of the plan. This phase embodies an examination of the company’s operations and support services. Scope activities could include creating a detailed account of the work required, listing the resources to be used, and defining the management practices to be employed.
With the advent of the personal computer in the workplace, distributed processing introduces special problems into the BCP process. It’s important that the centralized planning effort encompass all distributed processes and systems.
Roles and Responsibilities
The BCP process involves many personnel from various parts of the enterprise. Creation of a BCP committee will represent the first enterprisewide involvement of the major critical functional business units. All other business units will be involved in some way later, especially during the implementation and awareness phases.
- The BCP committee. A BCP committee should be formed and given the responsibility to create, implement, and test the plan. The committee is made up of representatives from senior management, all functional business units, information systems, and security administration. The committee initially defines the scope of the plan, which should deal with how to recover promptly from a disruptive event and mitigate the financial and resource loss due to a disruptive event.
- Senior Management’s Role. Senior management has the ultimate responsibility for all phases of the plan, which includes not only initiation of the plan process but also monitoring and management of the plan during testing and supervision and execution of the plan during a disruptive event. This support is essential, and without management being willing to commit adequate tangible and intangible resources, the plan will not be successful.
The business resumption, or business continuity, plan must have total, highly visible senior management support. Senior management must agree on the scope of the project, delegate resources for the success of the project, and support the timeline and training efforts.
Also, many elements of the BCP will address senior management, such as the statement of importance and priorities, the statement of organizational responsibility, and the statement of urgency and timing. Table 8-1 shows the roles and responsibilities in the BCP process.
WHO |
DOES WHAT |
---|---|
Executive management staff |
Initiates the project, gives final approval, and gives ongoing support |
Senior business unit management |
Identifies and prioritizes time-critical systems |
BCP committee |
Directs the planning, implementation, and test processes |
Functional business units |
Participate in implementation and testing |
CONTINGENCY PLANNERS
Contingency planners have many roles and responsibilities when planning business continuity, disaster recovery, emergency management, or business resumption processes. Some of these roles and responsibilities can include:
- Providing direction to senior management and ensuring executive management compliance with the contingency plan program
- Integrating the planning process across business units
- Providing periodic management reports and status
- Ensuring the identification of all critical business functions
- Coordinating and integrating the activation of emergency response organizations
THE FCPA
The Foreign Corrupt Practices Act of 1977 imposes civil and criminal penalties if publicly held organizations fail to maintain adequate controls over their information systems. Organizations must take reasonable steps to ensure not only the integrity of their data but also the system controls the organization put in place.
Some organizations with mature business resumption plans (BRPs) employ a tiered structure that mirrors the organization’s hierarchy. Senior management is always the highest level of decision makers in the BRP process, although the policy group also consists of upper-level executives. The policy group approves emergency management decisions involving expenditures, liabilities, and service impacts. The next group, the disaster management team, often consists of department and business unit representatives and makes decisions regarding life safety and disaster recovery efforts. The next group, the emergency response team, supplies tactical response to the disaster and may consist of members of data processing, user support, or persons with first aid and evacuation responsibilities.[*]
Because of the concept of due diligence, stockholders may hold senior managers as well as the board of directors personally responsible if a disruptive event causes losses that adherence to base industry standards of due care could have prevented. For this reason and others, it is in the senior managers’ best interest to be fully involved in the BCP process.
Senior corporate executives are increasingly being held liable for failure of due care in disasters. They may also face civil suits from shareholders and clients for compensatory damages. The definition of due care is being updated to include computer functionality outages as more and more people around the world depend upon information to do their jobs.
Business Impact Assessment
The purpose of a BIA is to create a document to be used to help understand what impact a disruptive event would have on the business. The impact may be financial (quantitative) or operational (qualitative, such as the inability to respond to customer complaints). A vulnerability assessment is often part of the BIA process.
BIA has three primary goals:
- Criticality Prioritization. Every critical business unit process must be identified and prioritized, and the impact of a disruptive event must be evaluated. Obviously, non–time-critical business processes will require a lower priority rating for recovery than time-critical business processes.
- Downtime Estimation. The BIA is used to help estimate the Maximum Tolerable Downtime (MTD) that the business can tolerate and still remain a viable company; that is, what is the longest period of time a critical process can remain interrupted before the company can never recover? It is often found during the BIA process that this time period is much shorter than expected; that is, the company can tolerate only a much briefer period of interruption than was previously thought.
- Resource Requirements. The resource requirements for the critical processes are also identified at this time, with the most time-sensitive processes receiving the most resource allocation.
A BIA generally takes the form of these four steps:
- Gathering the needed assessment materials
- Performing the vulnerability assessment
- Analyzing the information compiled
- Documenting the results and presenting recommendations
Gathering Assessment Materials
The initial step of the BIA is identifying which business units are critical to continuing an acceptable level of operations. Often, the starting point is a simple organizational chart that shows the business units’ relationships to each other. Other documents may also be collected at this stage in an effort to define the functional interrelationships of the organization.
As the materials are collected and the functional operations of the business are identified, the BIA will examine these business function interdependencies with an eye toward several factors, such as determining the business success factors involved, establishing a set of priorities between the units, and deciding what alternate processing procedures can be utilized.
The Vulnerability Assessment
The vulnerability assessment is often part of a BIA. It is similar to a Risk Assessment in that there is a quantitative (financial) section and a qualitative (operational) section. It differs in that the vulnerability assessment is smaller than a full risk assessment and is focused on providing information that is used solely for the business continuity plan or disaster recovery plan.
A function of a vulnerability assessment is to conduct a loss impact analysis. Because there will be two parts to the assessment (a financial assessment and an operational assessment), it will be necessary to define loss criteria both quantitatively and qualitatively.
Quantitative loss criteria can be defined as follows:
- Incurring financial losses from loss of revenue, capital expenditure, or personal liability resolution
- The additional operational expenses incurred because of the disruptive event
- Incurring financial loss from resolution of violation of contract agreements
- Incurring financial loss from resolution of violation of regulatory or compliance requirements
Qualitative loss criteria can consist of the following:
- The loss of competitive advantage or market share
- The loss of public confidence or credibility, or incurring public embarrassment
During the vulnerability assessment, critical support areas must be defined in order to assess the impact of a disruptive event. A critical support area is defined as a business unit or function that must be present to sustain continuity of the business processes, maintain life safety, or avoid public relations embarrassment.
Critical support areas could include the following:
- Telecommunications, data communications, or information technology areas
- Physical infrastructure or plant facilities, transportation services
- Accounting, payroll, transaction processing, customer service, purchasing
The granular elements of these critical support areas will also need to be identified. By granular elements we mean the personnel, resources, and services that the critical support areas need to maintain business continuity.
Common steps to performing a vulnerability assessment could be[*]:
- List potential emergencies, both internally to your facility and externally to the community. Natural, man-made, technological, and human errors are all categories of potential emergencies and errors.
- Estimate the likelihood that each emergency could occur, in a subjective analysis.
- Assess the potential impact of the emergency on the organization in the areas of human impact (death or injury), property impact (loss or damage), and business impact (market share or credibility).
- Assess external and internal resources required to deal with the emergency, and determine whether they are located internally or whether external capabilities or procedures are required.
Figure 8-1 shows a sample vulnerability matrix. This can be used to create a subjective impact analysis for each type of emergency and its probability. The lower the final number the better, as a high number means a high probability, impact, or lack of remediation resources.
TYPE OF EMERGENCY |
Probability |
Human Impact |
Property Impact |
Business Impact |
Internal Resources |
External Resources |
Total |
---|---|---|---|---|---|---|---|
High 5↔1 Low |
High Impact 5↔1 Low Impact |
Weak Resources 5↔1 Strong Resources |
|||||
Figure 8-1: Sample vulnerability assessment matrix.
THE CRITICALITY SURVEY
A criticality survey is another term for a standardized questionnaire or survey methodology, such as the InfoSec Assessment Method (IAM), or it could be a subset of the Security Systems Engineering Capability Maturity Model (SSECMM). Its purpose is to help identify the most critical business functions by gathering input from management personnel in the various business units.
Analyzing the Information
During the analysis phase of the BIA, several activities take place, such as documenting required processes, identifying interdependencies, and determining what an acceptable interruption period would be.
The goal of this section is to clearly describe what support the defined critical areas will require to preserve the revenue stream and maintain predefined processes, such as transaction processing levels and customer service levels. Therefore, elements of the analysis will have to come from many areas of the enterprise.
Documentation and Recommendation
The last step of the BIA entails a full documentation of all the processes, procedures, analyses, and results and the presentation of recommendations to the appropriate senior management.
The report will contain the previously gathered material, list the identified critical support areas, summarize the quantitative and qualitative impact statements, and provide the recommended recovery priorities generated from the analysis.
Business Continuity Plan Development
Business Continuity Plan development refers to using the information collected in the BIA to create the recovery strategy plan to support these critical business functions. Here the planner takes the information gathered from the BIA and begins to map out a strategy for creating a continuity plan.
This phase consists of two main steps:
- Defining the continuity strategy
- Documenting the continuity strategy
Defining the Continuity Strategy
To define the BCP strategy, the information collected from the BIA is used to create a continuity strategy for the enterprise. This task is large, and many elements of the enterprise must be included in defining the continuity strategy, such as:
- Computing. A strategy needs to be defined to preserve the elements of hardware, software, communication lines, applications, and data.
- Facilities. The strategy needs to address the use of the main buildings or campus and any remote facilities.
- People. Operators, management, and technical support personnel will have defined roles in implementing the continuity strategy.
- Supplies and equipment. Paper, forms, HVAC, or specialized security equipment must be defined as they apply to the continuity plan.
In developing plans, consideration should be given to both short-term and long-term goals and objectives. Short-term goals can include:
- Vital personnel, systems, operations, and equipment
- Priorities for restoration and mitigation
- Acceptable downtime before restoration to a minimum level of operations
- Minimum resources needed to accomplish the restoration
Long-term goals and objectives can include[*]:
- The organization’s strategic plan
- Management and coordination of activities
- Funding and fiscal management
- Management of volunteer, contractual, and entity resources
THE INFORMATION TECHNOLOGY DEPARTMENT
The IT department plays a very important role in identifying and protecting the company’s internal and external information dependencies. Also, the information technology elements of the BCP should address several vital issues, including:
- Ensuring that the organization employs an adequate data backup and restoration process, including off-site media storage
- Ensuring that the company employs sufficient physical security mechanisms to preserve vital network and hardware components, including file and print servers
- Ensuring that the organization uses sufficient logical security methodologies (authentication, authorization, etc.) for sensitive data
- Ensuring that the department implements adequate system administration, including up-to-date inventories of hardware, software, and media storage
Documenting the Continuity Strategy
Documenting the continuity strategy simply refers to the creation of documentation of the results of the continuity strategy definition phase. You will see the word documentation a lot in this chapter. Documentation is required in almost all sections, and it is the nature of BCP/DRP to require a lot of paper.
Plan Approval and Implementation
As the last step, the business continuity plan is implemented. The plan itself must contain a roadmap for implementation. Implementation here doesn’t mean executing a disaster scenario and testing the plan, but rather it refers to the following steps:
- Approval by senior management
- Creating an awareness of the plan enterprisewide
- Maintenance of the plan, including updating when needed
- Senior management approval. As previously mentioned, senior management has the ultimate responsibility for all phases of the plan. Because they have the responsibility for supervision and execution of the plan during a disruptive event, they must have final approval. When a disaster strikes, senior management must be able to make informed decisions quickly during the recovery effort.
- Plan awareness. Enterprisewide awareness of the plan is important. There are several reasons for this, including the fact that the capability of the organization to recover from an event will most likely depend on the efforts of many individuals. Also, employee awareness of the plan will emphasize the organization’s commitment to its employees. Specific training may be required for certain personnel to carry out their tasks, and quality training is perceived as a benefit that increases the interest and the commitment of personnel in the BCP process.
- Plan maintenance. Business continuity plans often get out of date: a major similarity among recovery plans is how quickly they become obsolete, for many different reasons. The company may reorganize, and the critical business units may be different than when the plan was first created. Most commonly, the network or computing infrastructure changes, including the hardware, software, and other components. The reasons also may be administrative: Cumbersome plans are not easily updated, personnel lose interest or forget, or employee turnover may affect involvement.
- Whatever the reason, plan maintenance techniques must be employed from the outset to ensure that the plan remains fresh and usable. It’s important to build maintenance procedures into the organization by using job descriptions that centralize responsibility for updates. Also, create audit procedures that can report regularly on the state of the plan. It’s also important to ensure that multiple versions of the plan do not exist, because they could create confusion during an emergency. Always replace older versions of the text with updated versions throughout the enterprise when a plan is changed or replaced.
[*]Source: Paul H. Rosenthal, “Business Contingency Planning 201,” Contingency Planning and Management (May 2000).
[*]Source: FEMA, “Emergency Management Guide for Business and Industry,” August 1998.
[*]Source: National Fire Protection Association, “NFPA 1600 Standard on Disaster/Emergency Management and Business Continuity,” 2000 edition.
Disaster Recovery Planning (DRP)
I don’t think anyone can question the importance of a working, tested, reality-based Disaster Recovery Plan (DRP). A disaster recovery plan is a comprehensive statement of consistent actions to be taken before, during, and after a disruptive event that causes a significant loss of information systems resources. Disaster Recovery Plans are the procedures for responding to an emergency, providing extended backup operations during the interruption, and managing recovery and salvage processes afterwards, should an organization experience a substantial loss of processing capability.
The primary objective of the disaster recovery plan is to provide the capability to implement critical processes at an alternate site and return to the primary site and normal processing within a time frame that minimizes the loss to the organization by executing rapid recovery procedures.
When planning for a disaster, it’s important to try to account for the unexpected consequences of the both the disaster and the remediation. When you try to “expect the unexpected,” however, that doesn’t mean you can literally and financially prepare for every contingency. Preparing as well as possible for what you can will reduce the negative impact of unforeseen events. If 70 percent, 80 percent, or 90 percent of the recovery goes smoothly and according to plan, the unexpected events will have a much smaller impact on survivability of the business.
Disasters primarily affect availability, which affects the ability of the staff to access the data and access working systems, but a disaster can also affect the other two tenets: confidentiality and integrity.
Goals and Objectives of DRP
A major goal of DRP is to provide an organized way to make decisions if a disruptive event occurs. The purpose of the disaster recovery plan is to reduce confusion and enhance the ability of the organization to deal with the crisis.
Obviously, when a disruptive event occurs, the organization will not have the luxury to create and execute a recovery plan on the spot. Therefore, the amount of planning and testing that can be done beforehand will determine the capability of the organization to withstand a disaster.
The objectives of the DRP are multiple, but each is important. They can include the following:
- Protecting an organization from major computer services failure
- Minimizing the risk to the organization from delays in providing services
- Guaranteeing the reliability of standby systems through testing and simulation
- Minimizing the decision making required by personnel during a disaster
In this section, we will examine the following areas of DRP:
- The DRP process
- Testing the disaster recovery plan
- Disaster recovery procedures
The Disaster Recovery Planning Process
This phase involves the development and creation of the recovery plans, which are similar to the BCP process. However, BCP is involved in BIA and loss criteria for identifying the critical areas of the enterprise that the business requires to sustain continuity and financial viability; the DRP process assumes that those identifications have been made and the rationale has been created. Now we’re defining the steps we will need to perform to protect the business in the event of an actual disaster. Table 8-2 shows a common scheme to classify the recovery time frame needs of each business function.
RATING CLASS |
RECOVERY TIMEFRAME REQUIREMENTS |
---|---|
AAA |
Immediate recovery needed; no downtime allowed |
AA |
Full functional recovery required within four hours |
A |
Same day business recovery required |
B |
Up to 24 hours downtime acceptable |
C |
24 to 72 hours downtime acceptable |
D |
Greater than 72 hours downtime acceptable |
DISASTER RECOVERY PLAN SOFTWARE TOOLS
Several vendors distribute automated tools to create disaster recovery plans. These tools can improve productivity by providing formatted templates customized to the particular organization’s needs. Some vendors also offer specialized recovery software focused on a particular type of business or vertical market. A good source of links to various vendors is located at:
- www.intiss.com/intisslinks.
The steps in the disaster planning process phase are:
- Data Processing Continuity Planning. Planning for the disaster and creating the plans to cope with it.
- Data Recovery Plan Maintenance. Keeping the plans up-to-date and relevant.
Data Processing Continuity Planning
The various means of processing backup services are all important elements to the disaster recovery plan. Here we look at the most common alternate processing types:
- Mutual aid agreements
- Subscription services
- Multiple centers
- Service bureaus
- Other data center backup alternatives
Mutual Aid Agreements
A mutual aid agreement (sometimes called a reciprocal agreement) is an arrangement with another company that may have similar computing needs. The other company may have similar hardware or software configurations or may require the same network data communications or Internet access as your organization.
In this type of agreement, both parties agree to support each other in the case of a disruptive event. This arrangement is made on the assumption that each organization’s operations area will have the capacity to support the others in a time of need. This is a big assumption.
There are clear advantages to this type of arrangement. It allows an organization to obtain a disaster-processing site at very little or no cost, thereby creating an alternate processing site even though a company may have very few financial resources to create one. Also, if the companies have very similar processing needs - that is, the same network operating system, the same data communications needs, or the same transaction processing procedures), this type of agreement may be workable.
This type of agreement has serious disadvantages, however, and really should be considered only if the organization has the perfect partner (a subsidiary, perhaps) and has no other alternative to disaster recovery (i.e., a solution would not exist otherwise). One disadvantage is that it is highly unlikely that each organization’s infrastructure will have the extra, unused capacity to enable full operational processing during the event. Also, in contrast to a hot or warm site, this type of arrangement severely limits the responsiveness and support available to the organization during an event and can be used only for short-term outage support.
The biggest flaw in this type of plan is obvious if we ask what happens when the disaster is large enough to affect both organizations. A major outage can easily disrupt both companies, thereby canceling any advantage that this agreement may provide. The capacity and logistical elements of this type of plan make it seriously limited.
Subscription Services
Another type of alternate processing scenario is presented by subscription services. In this scenario, third-party commercial services provide alternate backup and processing facilities. Subscription services are probably the most common of the alternate processing site implementations. They have very specific advantages and disadvantages, as we will see.
There are three basic forms of subscription services with some variations:
- Hot site
- Warm site
- Cold site
Hot Site
This is the Cadillac of disaster recovery alternate backup sites. A hot site is a fully configured computer facility with electrical power, heating, ventilation, and air conditioning (HVAC) and functioning file/print servers and workstations. The applications that are needed to sustain remote transaction processing are installed on the servers and workstations and are kept up-to-date to mirror the production system. Theoretically, operators and other personnel should be able to walk in and, with a data restoration of modified files from the last backup, begin full operations in a very short time. If the site participates in remote journaling - that is, mirroring transaction processing with a high-speed data line to the hot site - even the backup time may be reduced or eliminated.
This type of site requires constant maintenance of the hardware, software, data, and applications to ensure that the site accurately mirrors the state of the production site. This adds administrative overhead and can be a strain on resources, especially if a dedicated disaster recovery maintenance team does not exist.
The advantages to a hot site are numerous. The primary advantage is that 24/7 availability and exclusivity of use are ensured. The site is available immediately (or within the allowable time tolerances) after the disruptive event occurs. The site can support an outage for a short time as well as a long-term outage.
Some of the drawbacks of a hot site are as follows:
- It is seriously the most expensive of any alternative. Full redundancy of all processing components (e.g., hardware, software, communications lines, and applications) is expensive, and the services provided to support this function will not be cheap.
- It is common for the service provider to oversell its processing capabilities, betting that not all its clients will need the facilities simultaneously. This situation could create serious contention for the site’s resources if a disaster is large enough to affect a major geographic region.
- There also exists a security issue at the hot site, because the applications may contain mirrored copies of live production data. Therefore, all the security controls and mechanisms that are required at the primary site must be duplicated at the hot site. Access must be controlled, and the organization must be aware of the security methodology implemented by the service organization.
- Also, a hot site may be administratively resource-intensive because controls must be implemented to keep the data up to date and the software patched.
Warm Site
A warm site could best be described as a cross between a hot site and cold site. Like a hot site, the warm site is a computer facility readily available with electrical power, HVAC, and computers, but the applications may not be installed or configured. It may have file/print servers, but not a full complement of workstations. External communication links and other data elements that commonly take a long time to order and install will be present, however.
To enable remote processing at this type of site, workstations will have to be delivered quickly, and applications and their data will need to be restored from backup media.
The advantages to this type of site, as opposed to the hot site, are primarily as follows:
- Cost. This type of configuration will be considerably less expensive than a hot site.
- Location. Because this type of site requires less extensive control and configuration, more flexibility exists in the choice of site.
- Resources. Administrative resource drain is lower than with the maintenance of a hot site.
The primary disadvantage of a warm site, compared to a hot site, is the difference in the amount of time and effort it will take to start production processing at the new site. If extremely urgent critical transaction processing is not needed, this may be an acceptable alternative.
Cold Site
A cold site is the least ready of any of the three choices, but it is probably the most common of the three. A cold site differs from the other two in that it is ready for equipment to be brought in during an emergency, but no computer hardware (servers or workstations) resides at the site. The cold site is a room with electrical power and HVAC, but computers must be brought on-site if needed, and communications links may be ready or not. File and print servers have to be brought in, as well as all workstations, and applications will need to be installed and current data restored from backups.
A cold site is not considered an adequate resource for disaster recovery, because of the length of time required to get it going and all the variables that will not be resolved before the disruptive event. In reality, using a cold site will most likely make effective recovery impossible. It will be next to impossible to perform an in-depth disaster recovery test or to do parallel transaction processing, making it very hard to predict the success of a disaster recovery effort.
There are some advantages to a cold site, however, the primary one being cost. If an organization has very little budget for an alternative backup-processing site, the cold site may be better than nothing. Also, resource contention with other organizations will not be a problem, and neither will geographic location likely be an issue.
The big problem with this type of site is that having the cold site could engender a false sense of security. But until a disaster strikes, there’s really no way to tell whether it works or not, and by then it will be too late.
TERTIARY SITES
A tertiary site is a secondary backup site which can be used in case the primary backup site (regardless of whether it’s hot, warm, or cold) is not able to handle the recovery process or is completely unavailable. If an organization requires an extremely low MTD, or is not totally comfortable with just one backup site, a tertiary site may be designed and built.
Multiple Centers
A variation on the previously listed alternative sites is called multiple centers, or dual sites. In a multiple-center concept, the processing is spread over several operations centers, creating a distributed approach to redundancy and sharing of available resources. These multiple centers could be owned and managed by the same organization (in-house sites) or used in conjunction with some sort of reciprocal agreement.
The advantages are primarily financial, because the cost is contained. Also, this type of site will often allow for resource and support sharing among the multiple sites. The main disadvantage is the same as for mutual aid: a major disaster could easily overtake the processing capability of the sites. Also, multiple configurations could be difficult to administer.
Service Bureaus
In rare cases, an organization may contract with a service bureau to fully provide all alternate backup-processing services. The big advantage to this type of arrangement is the quick response and availability of the service bureau, testing is possible, and the service bureau may be available for more than backup. The disadvantages of this type of setup are primarily the expense and resource contention during a large emergency.
Other Data Center Backup Alternatives
There are a few other alternatives to the ones we have previously mentioned. Quite often an organization may use some combination of these alternatives in addition to one of the preceding scenarios.
- Rolling/mobile backup sites - Contracting with a vendor to provide mobile backup services. This may take the form of mobile homes or flatbed trucks with power and HVAC sufficient to stage the alternate processing required. This is considered a cold site variation.
- In-house or external supply of hardware replacements - Vendor resupply of needed hardware, or internal stockpiling of critical components inventory. The organization may have a subscription service with a vendor to send identified critical components overnight. This option may be acceptable for a warm site but is not acceptable for a hot site.
- Prefabricated buildings - It’s not unusual for a company to employ a service organization to construct prefabricated buildings to house the alternate processing functions if a disaster should occur. This is not too different from a mobile backup site - a very cold site.
Transaction Redundancy Implementations
The CISSP candidate should understand the three concepts used to create a level of fault tolerance and redundancy in transaction processing. Although these processes are not used solely for disaster recovery, they are often elements of a larger disaster recovery plan. If one or more of these processes are employed, the ability of a company to get back on-line is greatly enhanced.
- Electronic vaulting. Electronic vaulting refers to the transfer of backup data to an off-site location. This is primarily a batch process of dumping the data through communications lines to a server at an alternate location.
- Remote journaling. Remote journaling refers to the parallel processing of transactions to an alternate site, as opposed to a batch dump process like electronic vaulting. A communications line is used to transmit live data as it occurs. This feature enables the alternate site to be fully operational at all times and introduces a very high level of fault tolerance.
- Database shadowing. Database shadowing uses the live processing of remote journaling, but it creates even more redundancy by duplicating the database sets to multiple servers. See the discussion of redundant servers in the section on Network Availability in Chapter 3.
The creation of hot backup sites with remote journaling and tertiary sites can become quite complicated, with layers of multiple protocols, hardware, and software mirroring required. Figure 8-2 shows an organization using a Frame Relay network, mirroring transactions to multiple sites, employing FRNDs (Frame Relay Network Devices) and FRADs (Frame Relay Access Devices).
Figure 8-2: Frame Relay network mirroring to backup sites.
Disaster Recovery Plan Maintenance
Disaster recovery plans often get out of date. A similarity common to all recovery plans is how quickly they become obsolete, for many different reasons. The company may reorganize, and the critical business units may be different from the ones existing when the plan was first created. Most commonly, changes in the network or computing infrastructure may change the location or configuration of hardware, software, and other components. The reasons may be administrative: Complex disaster recovery plans are not easily updated, personnel lose interest in the process, or employee turnover may affect involvement.
Whatever the reason, plan maintenance techniques must be employed from the outset to ensure that the plan remains fresh and usable. It’s important to build maintenance procedures into the organization by using job descriptions that centralize responsibility for updates. Also, create audit procedures that can report regularly on the state of the plan. It’s also important to ensure that multiple versions of the plan do not exist, because they could create confusion during an emergency. Always replace older versions of the text with updated versions throughout the enterprise when a plan is changed or replaced.
Emergency management plans, business continuity plans, and disaster recovery plans should be regularly reviewed, evaluated, modified, and updated. At a minimum, the plan should be reviewed at an annual audit. The plan should also be reevaluated:
- After tests or training exercises, to adjust any discrepancies between the test results and the plan
- After a disaster response or an emergency recovery, as this is an excellent time to amend the parts of the plan that were not effective
- When personnel, their responsibilities, their resources, or organizational structures change, to familiarize new or reorganized personnel with procedures
- When polices, procedures, or infrastructures change
Testing the Disaster Recovery Plan
Testing the disaster recovery plan is very important (a tape backup system cannot be considered working until full restoration tests have been conducted); a disaster recovery plan has many elements that are only theoretical until they have actually been tested and certified. The test plan must be created, and testing must be carried out in an orderly, standardized fashion and be executed on a regular basis.
Also, there are five specific disaster recovery plan–testing types that the CISSP candidate must know (see “The Five Disaster Recovery Plan Test Types” later in this section). Regular disaster recovery drills and tests are a cornerstone of any disaster recovery plan. No demonstrated recovery capability exists until the plan is tested. The tests must exercise every component of the plan for confidence to exist in the plan’s ability to minimize the impact of a disruptive event.
Reasons for Testing
In addition to the general reasons for testing that we have previously mentioned, there are several specific reasons to test, primarily to inform management of the recovery capabilities of the enterprise. Other specific reasons are as follows:
- Testing verifies the accuracy of the recovery procedures and identifies deficiencies.
- Testing prepares and trains the personnel to execute their emergency duties.
- Testing verifies the processing capability of the alternate backup site.
Creating the Test Document
To get the maximum benefit and coordination from the test, a document outlining the test scenario must be produced, containing the reasons for the test, the objectives of the test, and the type of test to be conducted (see the five following types). Also, this document should include granular details of what will happen during the test, including the following:
- The testing schedule and timing
- The duration of the test
- The specific test steps
- Who will be the participants in the test
- The task assignments of the test personnel
- The resources and services required (supplies, hardware, software, documentation, and so forth)
Certain fundamental concepts will apply to the testing procedure. Primarily, the test must not disrupt normal business functions. Also, the test should start with the easy testing types (see the following section) and gradually work up to major simulations after the recovery team has acquired testing skills.
It’s important to remember that the reason for the test is to find weaknesses in the plan. If no weaknesses were found, it was probably not an accurate test. The test is not a graded contest on how well the recovery plan or personnel executing the plan performed. Mistakes will be made, and this is the time to make them. Document the problems encountered during the test and update the plan as needed, and then test again.
TEST YOUR BACKUP REGULARLY!
If you don’t know whether the data can be retrieved quickly and accurately, or if the process has not been tested to your level of comfort, it’s not a working backup. One of us had an experience with a small New York securities firm that was in the middle of merger negotiations. Their primary server crashed, and at that point they discovered that all their backup tapes were blank; although the backup was running, no data was ever written to them. They had never tested the restore procedure. The crash was so severe that external third-party disk data restorers weren’t able to restore much data. Although some paper records existed, the value of the company tanked, and the merger failed.
The same one of us also worked with a major university that had had its e-mail system sabotaged by the recently fired systems administrator, and its backups were rendered useless. It took many weeks to build a new e-mail system, using multiple platforms, and although legal action was successfully initiated against the sysadmin, the VP of IT was forced to resign.
The Five Disaster Recovery Plan Test Types
Disaster recovery/emergency management plan testing scenarios have several levels and can be called different things, but there are generally five types of disaster recovery plan tests. The listing here is prioritized, from the simplest to the most complete testing type. As the organization progresses through the tests, each test is progressively more involved and more accurately depicts the actual responsiveness of the company. Some of the testing types, such as the last two, require major investments of time, resources, and coordination to implement. The CISSP candidate should know all of these and what they entail.
The following are the testing types:
- Checklist review. During a checklist type of disaster recovery plan, copies of the plan are distributed to each business unit’s management. The plan is then reviewed to ensure that the plan addresses all procedures and critical areas of the organization. This is considered a preliminary step to a real test and is not a satisfactory test in itself.
- Table-top exercise or structured walk-through test. In this type of test, members of the emergency management group and business unit management representatives meet in a conference room setting to discuss their responsibilities and how they would react to emergency scenarios by stepping through the plan. The goal is to ensure that the plan accurately reflects the organization’s ability to recover successfully, at least on paper. Each step of the plan is walked through in the meeting and marked as performed. Major glaring faults with the plan should be apparent during the walk-through.
- Walk-through drill or simulation test. The emergency management group and response teams actually perform their emergency response functions by walking through the test, without actually initiating recovery procedures. During a simulation test, all the operational and support personnel expected to perform during an actual emergency meet in a practice session. The goal here is to test the ability of the personnel to respond to a simulated disaster. The simulation goes to the point of relocating to the alternate backup site or enacting recovery procedures, but it does not perform any actual recovery process or alternate processing.
- Functional drill or parallel test. This type tests specific functions such as medical response, emergency notifications, warning and communications procedures, and equipment, although not necessarily all at once. This type of test also includes evacuation drills, in which personnel walk the evacuation route to a designated area where procedures for accounting for the personnel are tested. A parallel test is a full test of the recovery plan, utilizing all personnel. The goal of this type of test is to ensure that critical systems will actually run at the alternate processing backup site. Systems are relocated to the alternate site, parallel processing is initiated, and the results of the transactions and other elements are compared.
- Full-interruption or full-scale exercise. A real-life emergency situation is simulated as closely as possible. This test involves all the participants who would be responding to the real emergency, including community and external organizations. The test may involve ceasing some real production processing. The plan is totally implemented as if it were a real disaster, to the point of involving emergency services (although for a major test, local authorities might be informed and help coordinate).
Table 8-3 lists the five disaster recovery plan testing types in priority.
LEVEL |
TYPE |
DESCRIPTION |
---|---|---|
1 |
Checklist |
Copies of plan are distributed to management for review. |
2 |
Table-top Exercise |
Management meets to step through the plan. |
3 |
Simulation |
All support personnel meet in a practice execution session. |
4 |
Functional Drill |
All systems are functionally tested and drills executed. |
5 |
Full-Scale Exercise |
Real-life emergency situation is simulated. |
Figure 8-3: Disaster Recovery Plan Testing Types
PLAN VIABILITY
Remember: The functionality of the recovery plan will directly determine the survivability of the organization. The plan shouldn’t be a document gathering dust in the CIO’s bookcase. It has to reflect the actual capability of the organization to recover from a disaster, and therefore needs to be tested regularly.
Disaster Recovery Procedures
This part of the plan details what roles various personnel will take on, what tasks must be implemented to recover and salvage the site, how the company interfaces with external groups, and what financial considerations will arise. Senior management must resist the temptation to participate hands-on in the recovery effort, as these efforts should be delegated. Senior management has many very important roles in the process of disaster recovery, including:
- Remaining visible to employees and stakeholders
- Directing, managing, and monitoring the recovery
- Rationally amending business plans and projections
- Clearly communicating new roles and responsibilities
Information or technology management has more tactical roles to play, such as:
- Identifying and prioritizing mission-critical applications
- Continuously reassessing the recovery site’s stability
- Recovering and constructing all critical data
Monitoring employee morale and guarding against employee burnout during a disaster recovery event is the proper role of human resources. Other emergency recovery tasks associated with human resources could include:
- Providing appropriate retraining
- Monitoring productivity of personnel
- Providing employees and family with counseling and support
The financial area is primarily responsible for:
- Reestablishing accounting processes, such as payroll, benefits, and accounts payable
- Reestablishing transaction controls and approval limits
Isolation of the incident scene should begin as soon as the emergency has been discovered. Authorized personnel should attempt to secure the scene and control access; however, no one should be placed in physical danger to perform these functions. It’s important for life safety that access be controlled immediately at the scene, and only by trained personnel directly involved in the disaster response. Additional injury or exposure to recovery personnel after the initial incident must be prevented.
The Recovery Team
A recovery team will be clearly defined with the mandate to implement the recovery procedures at the declaration of the disaster. The recovery team’s primary task is to get the predefined critical business functions operating at the alternate backup-processing site.
Among the many tasks the recovery team will have will be the retrieval of needed materials from off-site storage - that is, backup tapes, media, workstations, and so on. When this material has been retrieved, the recovery team will install the necessary equipment and communications. The team will also install the critical systems, applications, and data required for the critical business units to resume working.
The Salvage Team
A salvage team, separate from the recovery team, will be dispatched to return the primary site to normal processing environmental conditions. It’s advisable to have a different team, because this team will have a different mandate from the recovery team. They are not involved with the same issues the recovery team is concerned with, such as creating production processing and determining the criticality of data. The salvage team has the mandate to quickly and, more importantly, safely clean, repair, salvage, and determine the viability of the primary processing infrastructure after the immediate disaster has ended.
Clearly, this cannot begin until all possibility of personal danger has ended. Firefighters or police might control the return to the site. The salvage team must identify sources of expertise, equipment, and supplies that can make the return to the site possible. The salvage team supervises and expedites the cleaning of equipment or storage media that may have suffered from smoke damage, the removal of standing water, and the drying of water-damaged media and papers.
This team is often also given the authority to declare when the site is up and running again - that is, when the resumption of normal duties can begin at the primary site. This responsibility is large, because many elements of production must be examined before the green light is given to the recovery team that operations can return.
Normal Operations Resume
This job is normally the task of the recovery team, or another, separate resumption team may be created. The plan must have full procedures on how the company will return production processing from the alternate site to the primary site with the minimum of disruption and risk. It’s interesting to note that the steps to resume normal processing operations will be different from the steps in the recovery plan; that is, the least critical work should be brought back first to the primary site.
WHEN IS A DISASTER OVER?
When is a disaster over? The answer is very important. The disaster is not over until all operations have been returned to their normal location and function. A very large window of vulnerability exists when transaction processing returns from the alternate backup site to the original production site. The disaster can be officially called over only when all areas of the enterprise are back to normal in their original home, and all data has been certified as accurate.
It’s important to note that the emergency is not over until all operations are back in full production mode at the primary site. Reoccupying the site of a disaster or emergency should not be undertaken until a full safety inspection has been done. Ideally the investigation into the cause of the emergency has been completed and all damaged property has been salvaged and restored before returning. During and after an emergency, the safety of personnel must be monitored, any remaining hazards must be assessed, and security must be maintained at the scene. After all safety precautions have been taken, an inventory of damaged and undamaged property must be done to begin salvage and restoration tasks. Also, the site must not be reoccupied until all on-site investigative processes have been completed. Detailed records must be kept of all disaster-related costs, and valuations must be made of the effect of the business interruption.[*]
All elements discussed here involve well-coordinated logistical plans and resources. To manage and dispatch a recovery team, a salvage team, and perhaps a resumption team is a major effort, and the short descriptions we have here should not give the impression that it is not a very serious task.
Other Recovery Issues
Several other issues must be discussed as important elements of a disaster scenario:
- Interfacing with external groups
- Employee relations
- Fraud and crime
- Financial disbursement
- Media relations
When an emergency occurs that could potentially have an impact outside the facility, the public must be informed, regardless of whether there is any immediate threat to public safety. The disaster recovery plan should include determinations of the audiences that may be affected by an emergency and procedures to communicate with them. Information the public will want to know could include public safety or health concerns, the nature of the incident, the remediation effort, and future prevention steps. Common audiences for information could include:
- The media
- Unions and contractors
- Shareholders
- Neighbors
- Employees’ families and retirees
Since the media is such an important link to the public, disaster plans and tests must contain procedures for addressing the media and communicating important information. A trained spokesperson should be designated, and established communications procedures should be prepared. Accurate and approved information should be released in a timely manner, without speculation, blame, or obfuscation.
Interfacing with External Groups
Quite often the organization may be well equipped to cope with a disaster in relation to its own employees, but it overlooks its relationship with external parties. The external parties could be municipal emergency groups such as police, fire, EMS, medical, or hospital staff; they could be civic officials, utility providers, the press, customers, or shareholders. How all personnel, from senior management on down, interact with these groups will impact the success of the disaster recovery effort. The recovery plan must clearly define steps and escalation paths for communications with these external groups.
One of the elements of the plan will be to identify how close the operations site is to emergency facilities: medical (hospital, clinic), police, and fire. The timeliness of the response of emergency groups will have a bearing on implementation of the plan when a disruptive event occurs.
Employee Relations
Another important facet of the disaster recovery plan is how the organization manages its relationship with its employees and their families. In the event of a major life- or safety-endangering event, the organization has an inherent responsibility to its employees (and families, if the event is serious enough). The organization must make preparations to be able to continue salaries even when business production has stopped. This salary continuance may be for an extended period of time, and the company should be sure its insurance can cover this cost, if needed. Also, the employees and their families may need additional funds for various types of emergency assistance for relocation or extended living support, as can happen with a major natural event such as an earthquake or flood.
Fraud and Crime
Other problems related to the event may crop up. Beware of those individuals or organizations that may seek to capitalize financially on the disaster by exploiting security concerns or other opportunities for fraud. In a major physical disaster, vandalism and looting are common occurrences. The plan must consider these contingencies.
Financial Disbursement
An often-overlooked facet of the disaster will be expense disbursement. Procedures for storing signed, authorized checks off-site must be considered in order to facilitate financial reimbursement. Also, the possibility that the expenses incurred during the event may exceed the emergency manager’s authority must be addressed.
Media Relations
A major part of any disaster recovery scenario involves the media. An important part of the plan must address dealing with the media and with civic officials. It’s important for the organization to prepare an established and unified organizational response that will be projected by a credible, trained, informed spokesperson. The company should be accessible to the media so they don’t go to other sources; report your own bad news so as to not appear to be covering up. Tell the story quickly, openly, and honestly to avoid suspicion or rumors. Before the disaster, as part of the plan, determine the appropriate clearance and approval processes for the media. It’s important to take control of dissemination of the story quickly and early in the course of the event.
[*]Source: FEMA, “Emergency Management Guide for Business and Industry,” August 1998.
Assessment Questions
You can find the answers to the following questions in Appendix A.
1. |
Which of the following choices is the first priority in an emergency?
|
|
2. |
Which of the following choices is not considered an appropriate role for senior management in the business continuity and disaster recovery process?
|
|
3. |
Why is it so important to test disaster recovery plans frequently?
|
|
4. |
Which of the following types of tests of disaster recovery/emergency management plans is considered the most cost-effective and efficient way to identify areas of overlap in the plan before conducting more demanding training exercises?
|
|
5. |
Which type of backup subscription service will allow a business to recover quickest?
|
|
6. |
Which of the following represents the most important first step in creating a business resumption plan?
|
|
7. |
What could be a major disadvantage to a mutual aid or reciprocal type of backup service agreement?
|
|
8. |
In developing an emergency or recovery plan, which of the following would not be considered a short-term objective?
|
|
9. |
When is the disaster considered to be officially over?
|
|
10. |
When should the public and media be informed about a disaster?
|
|
11. |
What is the number one priority of disaster response?
|
|
12. |
Which of the following is the best description of the criticality prioritization goal of the Business Impact Assessment (BIA) process?
|
|
13. |
Which of the following most accurately describes a business impact analysis (BIA)?
|
|
14. |
What is considered the major disadvantage to employing a hot site for disaster recovery?
|
|
15. |
Which of the following is not considered an appropriate role for Financial Management in the business continuity and disaster recovery process?
|
|
16. |
Which of the following is the most accurate description of a warm site?
|
|
17. |
Which of the following is not one of the five disaster recovery plan testing types?
|
|
18. |
Which of the following choices is an example of a potential hazard due to a technological event, rather than a human event?
|
|
19. |
Which of the following is not considered an element of a backup alternative?
|
|
20. |
Which of the following choices refers to a business asset?
|
|
21. |
Which of the following statements is not correct regarding the role of the recovery team during the disaster?
|
|
22. |
Which of the following choices is incorrect regarding when a BCP, DRP, or emergency management plan should be evaluated and modified?
|
|
23. |
When should security isolation of the incident scene start?
|
|
24. |
Which of the following is not a recommended step to take when resuming normal operations after an emergency?
|
|
25. |
Which of the following would not be a good reason to test the disaster recovery plan?
|
|
26. |
Which of the following statements is not true about the post-disaster salvage team?
|
|
27. |
Which of the following is the most accurate statement about the results of the disaster recovery plan test?
|
|
28. |
Which statement is true regarding the disbursement of funds during and after a disruptive event?
|
|
29. |
Which statement is true regarding company/employee relations during and after a disaster?
|
|
30. |
Which of the following choices is the correct definition of a Mutual Aid Agreement?
|
|
31. |
Which of the following most accurately describes a business continuity program?
|
|
32. |
Which of the following would best describe a cold backup site?
|
|
33. |
Which of the following would best describe a tertiary site?
|
|
Answers
1. |
Answer: c Life safety, or protecting the health and safety of everyone in the facility, is the first priority in an emergency or disaster. |
2. |
Answer: d The tactical assessment of information security is a role of information management or technology management, not senior management. |
3. |
Answer: b A plan is not considered functioning and viable until a test has been performed. An untested plan sitting on a shelf is useless and might even have the reverse effect of creating a false sense of security. Although the other answers, especially a, are good reasons to test, b is the primary reason. |
4. |
Answer: c In a table-top exercise, members of the emergency management group meet in a conference room setting to discuss their responsibilities and how they would react to emergency scenarios. |
5. |
Answer: a Warm and cold sites require more work after the event occurs to get them to full operating functionality. A mobile backup site might be useful for specific types of minor outages, but a hot site is still the main choice of backup processing site. |
6. |
Answer: b The business resumption, or business continuity plan, must have total, highly visible senior management support. |
7. |
Answer: c The site might not have the capacity to handle the operations required during a major disruptive event. Mutual aid might be a good system for sharing resources during a small or isolated outage, but a major natural or other type of disaster can create serious resource contention between the two organizations, both of which may be affected simultaneously. |
8. |
Answer: d The organization’s strategic plan is considered a long-term goal. |
9. |
Answer: c The disaster is officially over when all the elements of the business have returned to normal functioning at the original site. It’s important to remember that a threat to continuity exists when processing is being returned to its original site after salvage and cleanup has been done. |
10. |
Answer: a When an emergency occurs that could potentially have an impact outside the facility, the public must be informed, regardless of whether there is any immediate threat to public safety. |
11. |
Answer: b The number one function of all disaster response and recovery is the protection of the safety of people; all other concerns are vital to business continuity but are secondary to personnel safety. |
12. |
Answer: a The three primary goals of a BIA are criticality prioritization, maximum downtime estimation, and identification of critical resource requirements. Answer d is a distracter. |
13. |
Answer: b A business impact analysis (BIA) measures the effect of resource loss and escalating losses over time in order to provide the entity with reliable data upon which to base decisions on hazard mitigation and continuity planning. Answer a is a definition of a disaster/emergency management program. Answer c describes a mutual aid agreement. Answer d is the definition of a recovery program. |
14. |
Answer: b A hot site is commonly used for those extremely time-critical functions that the business must have up and running to continue operating, but the expense of duplicating and maintaining all the hardware, software, and application elements is a serious resource drain to most organizations. |
15. |
Answer: b Monitoring employee morale and guarding against employee burnout during a disaster recovery event is the proper role of human resources. |
16. |
Answer: b |
17. |
Answer: c |
18. |
Answer: b A financial collapse is considered a technological potential hazard, whereas the other three are human events. |
19. |
Answer: d A checklist is a type of disaster recovery plan test. Electronic vaulting is the batch transfer of backup data to an offsite location. Remote journaling is the parallel processing of transactions to an alternate site. A warm site is a backup processing alternative. |
20. |
Answer: c Answer a is a definition for a threat. Answer b is a description of mitigating factors that reduce the effect of a threat, such as an uninterruptible power supply (UPS), sprinkler systems, or generators. Answer d is a distracter. |
21. |
Answer: a The recovery team performs different functions from the salvage team. The recovery team’s primary mandate is to get critical processing reestablished at an alternate site. The salvage team’s primary mandate is to return the original processing site to normal processing environmental conditions. |
22. |
Answer: a Emergency management plans, business continuity plans, and disaster recovery plans should be regularly reviewed, evaluated, modified, and updated. At a minimum, the plan should be reviewed at an annual audit. |
23. |
Answer: a Isolation of the incident scene should begin as soon as the emergency has been discovered. |
24. |
Answer: a Reoccupying the site of a disaster or emergency should not be undertaken until a full safety inspection has been done, an investigation into the cause of the emergency has been completed, and all damaged property has been salvaged and restored. |
25. |
Answer: b The other three answers are good reasons to test the disaster recovery plan. |
26. |
Answer: a Salvage cannot begin until all physical danger has been removed or mitigated and emergency personnel have returned control of the site to the organization. |
27. |
Answer: c The purpose of the test is to find weaknesses in the plan. Every plan has weaknesses. After the test, all parties should be advised of the results, and the plan should be updated to reflect the new information. |
28. |
Answer: d Authorized, signed checks should be stored securely off-site for access by lower-level managers in the event senior-level or financial management is unable to disburse funds normally. |
29. |
Answer: a The organization has an inherent responsibility to its employees and their families during and after a disaster or other disruptive event. The company must be insured to the extent it can properly compensate its employees and families. Alternatively, employees do not have the right to obtain compensatory damages fraudulently if the organization cannot compensate. |
30. |
Answer: c A mutual aid agreement is used by two or more parties to provide for assistance if one of the parties experiences an emergency. Answer a describes a business continuity plan. Answer b describes a damage assessment, and answer d describes risk mitigation. |
31. |
Answer: a A business continuity program is an ongoing process supported by senior management and funded to ensure that the necessary steps are taken to identify the impact of potential losses, maintain viable recovery strategies and recovery plans, and ensure continuity of services through personnel training, plan testing, and maintenance. Answer b describes a disaster/emergency management program. Answer c describes a damage assessment. Answer d is a distracter. |
32. |
Answer: b A computer facility with electrical power and HVAC, with workstations and servers not present (but available to be brought on-site when the event begins) and no applications installed, is a cold site. Answer a is a hot site, and d is a warm site. Answer c is just an empty room. |
33. |
Answer: b A “tertiary site” is a secondary backup site that can be used in case the primary backup site is not available. |