Upgrading and Repairing Servers
Eventually every system fails, and occasionally systems fail in very unpredictable ways, such as the following:
All the aforementioned qualify as disasters, and we all pray that we are spared problems like these. However, good practice necessitates that organizations do disaster planning to keep critical business processes working or to recover from these problems as quickly and as cost-effectively as possible. The Purpose of Disaster Recovery Planning
When disaster strikes, your first inclination when trying to address the problem is likely to be wrong, and in fact may make things worse. In an emergency, there's a very strong tendency to act first and think second. A well-thought-out disaster recovery plan is an essential component of any well-run computing center. In some locales, planning for business continuity is not just good business practiceit is the law. The purpose of disaster planning is to codify a set of rules and actions that are to be followed when a problem occurs. Disaster recovery takes time to plan, and if disaster doesn't strike, it's a cost in time and salaries you might be tempted to forgo. A plan is a lot like insurance: If you don't need it, it's a waste of money, but when you do need it, it can save your organization a considerable amount of money, improve the quality of the recovery, and greatly diminish your downtime. Disaster recovery planning is part of an overall fault tolerance strategy. It's where you utilize all your backup systems and test your strategies. However, you don't have time to fix any deficiencies in your systems at the time a disaster occurs. Therefore, it is absolutely critical that every system that you count on for recovery be tested beforehand. Just as is the case for data backups themselves, it is absolutely critical that you know the integrity of your systems by doing the following:
There are many different tests that you can perform to test the viability of your recovery plan. The point is that none of them are any good if you decide to test them at the time a disaster strikes. So as part of your recovery plan, you need a set of regular action items to test the systems you count on. An Example of a Disaster Recovery Plan
A disaster recovery plan should be an ongoing effort that results in a working document. Every year, at the appointed time, the document should be brought out and revised. People at every significant level of the organization should review and sign off on the plan. Disaster planning is not just an IT exercise; the level of loss that an organization is willing to endure or the amount of money that an organization is willing to pay to avoid a loss is really a business decision. There should be a reasonable calculation made to quantify the decisions made in the disaster recovery plan. A disaster recovery plan should be written in the same way that any project plan is written. The plan should start with a clear and concise description of its purpose, there should be a table of contents for the issues the plan covers, and each issue should be written up in its own section. Sections should not only describe the issue and its potential solution(s) but designate who is responsible for action. Note A disaster plan needs to be readily accessible when needed. If your disaster plan is found only on a computer system, it isn't going to do you any good if that system goes down. A disaster plan should be a paper document that is stored with emergency equipment, with a copy or set of copies stored offsite in logical locations.
Disaster recovery plans are a little different than a lot of other project plans. They don't include definite time lines, although they may specify how long operations should take. They must also specify how problems are identified and how to escalate actions to the next level when issues aren't resolved. A good disaster recovery plan should include flowcharts that illustrate how actions should flow. Table 21.3 shows an example of the parts of a disaster recovery plan.
With a fully developed disaster recovery plan, if and when a disaster strikes, you will be in a much better position to minimize the damage, contain the costs, and bring your systems and services back online much more quickly. At times of great difficulty, it is best not to have to spend time thinking through complex responses. |
Категории