Data Protection and Information Lifecycle Management
< Day Day Up > |
The twin forces of regulation and cost control have changed the way IT managers look at data. The growing awareness that money is being spent on unimportant data has driven changes in how data is managed. At the same time, regulators and lawmakers throughout the world have burdened organizations with data retention requirements. Failure to comply with these requirements can bring about fines, lawsuits, and even prison terms. There has always been a sense that old data should be archived or removed. Most organizations had procedures, some formal and some ad hoc, for removing old data from online storage. These common practices have been extrapolated into a formal process called Data Lifecycle Management (DLM). Data Lifecycle Management describes how data is treated at different points in time. The policies for data management change as data ages and changes (Figure 7-1). These policies can then be translated into rules or scripts for applications that automate the policy. Figure 7-1. General data lifecycle model
As is the case with all policies, each organization must define the lifecycle for its data. There is a general model, however, that most data will follow. The lifecycle of data is defined by how often the data is accessed. As soon as data is created, it is most useful and used more often. Data created by transaction processing applications and word processors alike has the most value shortly after its creation. At this stage, the data must be kept online and available all the time. As the data gets older, the need to access it immediately diminishes. Data is still kept online, but guaranteed access time is no longer important. Users can wait some time to get it, if necessary. When the data is older still, the need to keep it online at all decreases until it can be removed from online storage altogether. Finally, when the data is no longer useful or when having it represents a liability to the organization, it is destroyed. Data Lifecycle Management and Data Protection
Data Lifecycle Management is intertwined with the data protection policies of an organization. Data protection policies must take into account the lifecycle of the data to use resources cost effectively. Otherwise, data that is unimportant will be given high levels of protection, resulting in a higher cost than is necessary. Conversely, data that is extremely important may not have adequate levels of protection due to resource constraints. Data Lifecycle Management policies also have to take into account data protection policies and systems. If the two are not synchronized, it is likely that the policies will be in conflict. It is possible to comply with a Data Lifecycle Management policy that insists that aged data be moved to less expensive storage while violating data protection policies that say all data must be protected to a high standard. Data protection systems by nature copy data to various locations on a network or off-site. This may conflict with Data Lifecycle Management policies commanding that data be completely destroyed. By including Data Lifecycle Management as part of data protection policies, conflicts can be averted, and a more cost-effective data protection system can be implemented. DLM Policies
Data Lifecycle Management policies are similar to data protection policies. The major difference is that the lifecycle of the data is taken into account when moving, destroying, or copying data. Data Lifecycle Management alters data protection policies in the following ways:
In the Widget Corporation example, the e-mail retention policy insisted that all customer e-mails be protected and available all the time. In a short time, this would lead to a huge e-mail database. The backup database would grow equally large in the same amount of time. Most of the e-mails, however, would be old and close to useless. Widget Corporation would soon be spending money to buy more storage for e-mails that no one needs anymore. The company has determined that customer e-mails are hardly ever accessed after two years and have no value after three years. The goals of the data protection policy can be amended to read as follows: All customer and prospect e-mails must be retained for two years. After two years, the e-mails are to be archived, and after three years, they are to be destroyed. The e-mail retention policies can be described in plain language as: Name: Customer E-Mail Retention and Destruction Policy Type: E-Mail Data Type: E-Mail Parent: E-Mail PolicyDescription: Policy governing the retention and destruction of customer e-mail Purpose: To support continuing business operations by ensuring that previous e-mail communications with customers are available to Sales, Marketing, and Customer Service. Creation Date: MAY 4, 2004Revision Date: APR 1, 2005Process: All e-mails to and from customers and potential customers (also known asprospects) will be copied to a duplicate copy of the e-mail database as theyare received. End-users are not allowed to delete customer e-mails in any way,including from their personal mailboxes. The primary and duplicate e-mail database will be backed up to tape eachnight; tapes will be rotated to according to current IT policy (IT Tape RotationPolicy).Each month, a survey of the e-mail databases and tapes will be done. Allcustomer e-mails two years old or older will be copied to DVD-ROM. They willthen be deleted from the primary e-mail database, secondary e-mail database,and backup tapes. All end-users are expected to delete all copies of customer e-mails more than two years old each month. Each month, DVD-ROMs more than a year old will be sent to a shreddingfacility and destroyed.Expected Results: All customer e-mails than older than two years will be available on DVD-ROM. E-mail older than three years will be destroyed. All customer e-mail less than two years old will always be available online all the time. Constraints: There is no automated end-user e-mail deletion tool. End-usersare expected to find and delete e-mails manually each month. Assets: primary_email, secondary_emailAsset Type: Disk array Asset: backup1Asset Type: Backup server with attached autoloaderAsset: dvd_rom_1Asset: Type: DVD-ROM jukebox By including Data Lifecycle Management concepts in the e-mail data protection policy, Widget Corporation does not need to increase the size of the e-mail storage as rapidly, saving money. The most valuable e-mails are given the highest degree of protection, less valuable ones are not. DLM Automation
The Achilles heel of policy-driven strategies is that they often require changes in human processes. System administrators have to perform certain tasks for the policy to be completed. Users have to follow certain procedures, which makes them behave differently in their daily work. Forcing people to change how they perform normal duties leads to errors in judgment, mistakes, and outright subversion of the process. When a process is inconvenient, it is never followed well or at all. Automation takes the work out of complying with policies. Users and administrators use software to perform the tasks that comprise the policy. When properly configured, the software does not make mistakes or balk at tiresome tasks. Data Lifecycle Management automation has two components: the policy engine and the data migration software The policy engine stores and executes the tasks, references, and constraints that express the DLM policy in terms that computer systems can understand. A policy is translated into a series of commands that other components of a system can then perform. The policy engine may translate a policy that states: Move all files from Finance to secondary storage after they are one year old. to a command such as the following: moveOldFiles //Finance //FinanceBackup -365 day
How the policy engine's rules are actually executed, as system tasks, depends on the data migration software. Data migration software may be little more than a group of scripts that execute operating system commands. It may also be very sophisticated software capable of moving data around a SAN, LAN, or WAN. No matter how the data migration software is constructed, its purpose is to migrate from one data store to another. To support DLM fully, data migration software needs to be able to copy, move, and delete data, based on age and physical location. Data migration software also needs to be able to support a variety of media, especially disk, tape, and optical storage such as CD-RW. Multi-tier Storage Architectures
To control the costs of policy-based data protection, IT organizations have turned to multi-tier storage architectures. Systems based on this architecture organize storage in several tiers or stages, with the most expensive, reliable, and available storage used for the most important data. Progressively less expensive storage is deployed for less important data, with archive systems occupying the lowest tier. This arrangement, coupled with Data Lifecycle Management software, allows data to be moved from more expensive to less expensive resources as it moves through its lifecycle. The top tier typically is composed of expensive, high-available Fibre Channel disk arrays supported by a full range of data protection systems. The next tier is often disk arrays with SATA drives. These are less expensive yet reliable. SATA systems provide reasonable performance as well. The final tier is filled by archive systems tape, optical disk (CD and DVD), or both (Figure 7-2). Figure 7-2. multi-tier storage architecture
As data ages, it is moved to the less expensive storage system. This drives down the cost of storing the data. As the data migrates to less expensive systems, the level of protection is reduced as well. At the top tier, extensive and expensive data protection strategies such as remote copy and replication may be used, while at the lowest tier, off-site storage of CDs may be all that is done. Do not confuse DLM and multi-tier storage. Hardware vendors tend to blur the lines between Data Lifecycle Management and multi-tier storage systems. DLM is a type of policy-based data management. multi-tier storage is hardware architecture. There are many reasons to have multi-tier storage systems that have nothing to do with DLM, and DLM is not dependent on multi-tier storage. They do support each other well, though they are not the same.
|
< Day Day Up > |