Business Continuity Plan (BCP)
The BCP is developed to prevent interruptions to normal business. If these events cannot be prevented, the goal of the plan is to minimize the outage. The other goal of the plan is to reduce the potential costs that such disruptions might cost an organization. Therefore, the business continuity plan should also be designed to help minimize the cost associated with the disruptive event and mitigate the risk associated with it. The BCP process as defined by the ISC2 has the following five steps:
1. |
Project management and initiation
|
2. |
Business impact analysis
|
3. |
Recovery strategy
|
4. |
Plan design and development
|
5. |
Testing, maintenance, awareness, and training
|
Each of these is discussed in the following sections.
Project Management and Initiation
Before the BCP process can begin, you need to make your case to management. You have to establish the need for the BCP. One way to start is to perform a risk analysis to identify and document potential outages to critical systems. The results should be presented to management so they understand the potential risk. That's a good time to remind them that, ultimately, they are responsible. Customers, shareholders, stockholders, or anyone else could bring a civil suit against senior management if they feel they have not practiced due care. If you don't get management's support, you will not have funds to successfully complete the project, and it will be marginally successful, if at all.
With management on board, you can start to develop a plan of action. This management plan should include the following:
- Scope of the project A properly defined scope is a tremendous help in ensuring that an effective BCP plan is devised. At this point in the process, the decision to do only a partial recovery or a full recovery would be made. In larger organizations, office politics can pull the project in directions that it might not need to be going. Another problem is project creep, which occurs when more items are added to the plan that were not part of original project plan. This can delay completion of the project or cause it to run over budget.
- Appointment of a project planner The project planner is a key role because this person drives the process. The project planner must ensure that all elements of the plan are properly addressed and that a sufficient level of research, planning, and analysis has been performed before the plan begins. This individual must also have enough creditability with senior management to influence them when the time comes to present the results and recommendations.
- Determination of who will be on the team Team members should have representatives from senior management, the legal staff, recovery team leaders, the information security department, various business units, networking, and physical security. You want to make sure that the individuals who would be responsible for executing the plan are involved in the development process.
- Finalize the project plan This step is similar to traditional project plan phases. The team leader and the team must finalize issues such as needed resources (personnel, financial), time schedules, budget estimates, and critical success factors. Scheduling meetings and BCP completion dates are two critical items that must be addressed at this point.
- Determine the data-collection method Different tools can be used to gather the data. Strohl Systems BIA Professional and SunGard's Paragon software can automate much of the BCP process. If you choose to use these tools, be sure to add time into your schedule. A learning curve is involved anytime individuals are introduced to software they are not familiar with.
Business Impact Analysis (BIA)
The BIA is the second step of the process. Its role is to describe what impact a disaster would have on critical business functions. The BIA is an important step in the process because it looks at the threats to these functions and the costs of a potential outage. As an example, the BIA might uncover the fact that DoS attacks that result in 2 hours of downtime of the company's VoIP phone system will result in $28,000 in lost revenue, whereas an 8-hour outage to the web server might cost the company only $1,000 in lost revenue. These types of numbers will help the organization determine what needs to be done to ensure the survival of the company. The eight steps in the BIA process are as follows:
1. |
Select individuals to interview.
|
2. |
Determine the methods to be used for gathering information.
|
3. |
Develop a customized questionnaire to gather specific monetary and operational impact information. This should include questions that inquire about both quantitative and qualitative losses. The goal is to use this data to help determine how the loss of any one function.
|
4. |
Analyze the compiled data.
|
5. |
Determine the time-critical business processes and functions.
|
6. |
Determine maximum tolerable downtimes for each process and function.
|
7. |
Prioritize the critical business process or function based on its maximum tolerable downtime (MTD).
|
8. |
Document the findings and report your recommendations to management.
|
MTD is a measurement of the longest time that an organization can survive without a specific business function. MTD estimates include critical (minutes to hours), urgent (24 hours or less), important (up to 72 hours), average (up to 7 days), and nonessential (these services can experience outages up to 30 days). |
The impact or loss that an organization faces because of lost service or data can be felt in many ways. These are generally measured by one of the following:
- Allowable business interruption What is the maximum tolerable downtime (MTD) the organization can survive without that function or service?
- Financial and operational considerations What will this outage cost? Will there be a loss of revenue or operational capital, or will we be held personally liable? Cost can be immediate or delayed. Other potential costs include any losses incurred because of failure in meeting the SLA requirements of customers.
- Regulatory requirements What violations of law or regulations could this cause? Is there a legal penalty?
- Organizational reputation Will this affect our competitive advantage, market share, or reputation?
The BIA builds the groundwork for determining how resources should be appropriated for recovery-planning efforts.
A vulnerability assessment is often part of a BIA. Although the assessment is somewhat similar to the risk-assessment process discussed in Chapter 3, "Security-Management Practices," this one focuses on providing information that is used just for the business continuity plan. |
Recovery Strategy
Recovery strategies are the predefined actions that management has approved to be followed in case normal operations are interrupted. Operations can be interrupted in several different ways:
- Data interruptions The focus here is on recovering the data. Solutions to data interruptions include backups, offsite storage, and remote journaling.
- Operational interruptions The interruption is caused by the loss of some type of equipment. Solutions to this type of interruption include hot sites, redundant equipment, Redundant Array of Inexpensive Disks (RAID), and Backup Power Supplies (BPS).
- Facility and supply interruptions Causes of these interruptions can include fire, loss of inventory, transportation problems, Heating Ventilation and Air Conditioner (HVAC) problems, and telecommunications.
- Business interruptions These interruptions can be caused by loss of personnel, strikes, critical equipment, supplies, and office space.
To evaluate the losses that could occur from any of these interruptions and determine the best recovery strategy, follow these steps:
1. |
Document all costs for each possible alternative.
|
2. |
Obtain cost estimates for any outside services that might be needed.
|
3. |
Develop written agreements with the chosen vendor for such services.
|
4. |
Evaluate what resumption strategies are possible in case there is a complete loss of the facility.
|
5. |
Document your findings and report your chosen recovery strategies to management for feedback and approval.
|
Plan Design and Development
In this phase, the team prepares and documents a detailed plan for recovery of critical business systems. The plan should be a guide for implementation. The plan should include information on both long-term and short-term goals and objectives:
- Identify critical functions and priorities for restoration.
- Identify support systems that are needed by critical functions.
- Estimate potential disasters and calculate the minimum resources needed to recover from the catastrophe.
- Select recovery strategies and determine what vital personnel, systems, and equipment will be needed to accomplish the recovery.
- Determine who will manage the restoration and testing process.
- Calculate what type of funding and fiscal management is needed to accomplish these goals.
The plan should also detail how the organization will interface with external groups, such as customers, shareholders, the media, the community, and region and state emergency services groups. The final step of the phase is to combine this information into the BCP plan and interface it with the organization's other emergency plans.
Testing, Maintenance, Awareness, and Training
This final phase of the process is for testing and maintaining the BCP. Training and awareness programs are also developed at this point. Testing the disaster-recovery plan is critical. Without performing a test, there is no way to know whether the plan will work. Testing helps make theoretical plans reality. As a CISSP candidate, you should be aware of the five different types of BCP testing:
- Checklist Although this is not considered a replacement for a real test, it is a good start. A checklist test is performed by sending copies of the plan to different department managers and business unit managers for review. Each person the plan is sent to can review it to make sure nothing was overlooked.
- Tabletop A tabletop test is performed by having the members of the emergency management team and business unit managers meet in a conference to discuss the plan. The plan then is "walked through" line by line. This gives all attendees a chance to see how an actual emergency would be handled and to discover dependencies. By reviewing the plan in this way, some errors or problems should become apparent.
The primary advantage of the tabletop testing method is to discover dependencies between different departments.
- Walkthrough This is an actual simulation of the real thing. This drill involves members of the response team acting in the same way as if there had been an actual emergency. This test proceeds to the point of recovery or to relocation of the alternative site. The primary purpose of this test is to verify that members of the response team can perform the required duties.
- Functional A functional test is similar to a walkthrough but actually starts operations at the alternative site. Operations of the new and old sites can be run in parallel.
- Full interruption This plan is the most detailed, time-consuming, and thorough. A full interruption test mimics a real disaster, and all steps are performed to startup backup operations. It involves all the individuals who would be involved in a real emergency, including internal and external organizations.
The CISSP exam will require you to know the differences of each BCP test type. You should also note the advantages and disadvantages of each. |
When the testing process is complete, a few additional items still need to be done. The organization must put controls in place to maintain the current level of business continuity and disaster recovery. This is best accomplished by implementing change-management procedures. If changes are required to the approved plans, you will then have a documented, structured way to accomplish this. A centralized command and control structure eases this burden. Controls also should be built into the procedures to allow for periodic retesting. Life is not static, and neither should be the organization's BCP plans. The individuals responsible for specific parts of the BCP process are listed in Table 9.1.
Person or Department |
Responsibility |
---|---|
Senior management |
Project initiation, ultimate responsibility, overall approval and support |
Midmanagement or business unit managers |
Identification and prioritization of critical systems |
BCP committee and team members |
Planning, day-to-day management, implementation and testing of the plan |
Functional business units |
Plan implementation, incorporation, and testing |
Senior management is ultimately responsible for the BCP. This includes project initiation, overall approval, and support. |
Awareness and Training
The goal of awareness and training is to make sure all employees know what to do in case of an emergency. If employees are untrained, they might simply stop what they're doing and run for the door anytime there's an emergency. Even worse, they might not leave when an alarm has sounded and they have been instructed to leave because of possible danger. Therefore, the organization should design and develop training programs to make sure each employee knows what to do and how to do it. Employees assigned to specific tasks should be trained to carry out needed procedures. Plan for cross-training of teams, if possible, so those team members are familiar with a variety of recovery roles and responsibilities.
The number one priority of any BCP or DRP plan is to protect the safety of employees. |