Oracle Real Application Clusters
| < Day Day Up > |
|
Modern business requirements could be classified by the abilities that the enterprise system should provide, such as availability, reliability, etc., in its day-to-day operations to provide data management to user requests. In this section the various abilities that are expected in today's businesses are discussed along with analysis on how some of these requirements tie into the database tier of the enterprise system.
Modern business requirements are classified by the abilities that a system should provide in its ultimate wisdom of existence. Figure 1.1 is a pictorial representation of the abilities that the modern enterprise system is expected to provide. Overall, these abilities could be assessed as basic requirements in every system. However, with the boom of Internet-based businesses in recent years, abilities such as availability, recoverability, scalability, manageability, and securability (security) have become a necessity. There are those additional elements, not classified under the abilities of a system, that are still vital requirements, such as throughput and response factors of the system. The primary focus of the following discussion will be with respect to the requirements of availability, scalability, manageability, and securability, but will include a brief mention of the other requirements.
1.2.1 Reliability
The reliability requirement says that the system should be reliable, i.e., when a user connects to the system to process a specific request, the system should be guaranteed to provide the expected results or a reasonable response. This requirement applies both to the tools used in the development of the application and to the application logic that is specific to the business. In today's Internet business, if a reliable service cannot be provided, there is the risk of the customer turning to another Internet site for their needs. This obviously would result in lost business.
1.2.2 Portability
Portability is an important requirement that is seldom seriously considered as it relates more to newly established businesses. The requirement of portability ensures that the enterprise systems are portable to various platforms as the business grows and when bigger and more efficient hardware platforms are needed. What this entails is that the development of an application, the supporting tools, the layered products and infrastructure should be organized in such a manner that they could be made available on any hardware platform with minimal or no change. If the requirement of portability is not considered in the early stages of development, the need for it later could potentially require rewrites of the systems, resulting in very expensive development life cycles.
An example under this requirement category would be the selection of the database infrastructure. When businesses start, they normally start small with probably one or two customers, a small server, a database and the application catering to few segments of the total intended market providing the minimal functionality. The reasons for a small start could be many, and definitely, the initial capital investment and the potential business risks. As the business gets established and more customers come aboard, there is the potential concern that the small server that was used to initially start the business may not have the required capacity to handle the additional increased business. At this stage, the enterprise may decide to move to bigger and better platforms. Now the questions normally asked during this process are, will the current application work on this new hardware? Will the infrastructure, including the database on the current hardware, be available on this new hardware? Well, the answer potentially would be maybe, but unless the database platform selected is available on multiple operating systems this may not be possible.
Thus, when selecting important tools and products, it is vital to select those that are supported on most operating systems and hardware platforms. An excellent example in this class would be Oracle RDBMS. Oracle supports almost all operating systems, from Open VMS, to Unix, including Linux and Windows.
1.2.3 Recoverability
Recoverability means that the system should be recoverable from failures with minimal downtime. At a basic level it is the average time required to repair a failed system (for example, an Oracle instance) or the database. Database recoverability directly relates to the quality of the backup strategy in place for the enterprise. A good backup strategy is based on the requirements for recovery time. Ideally, depending on the type of business and its critical nature, the business requirements, or service level agreement (SLA), would state that the system should be up and running quickly after a disaster. The size of a database, the interval at which the backups are being performed and at what level the backups are being taken all affect the recovery time for a database.
Choosing a good backup scheme also depends on the recovery time allowed, or the mean time to recover (MTTR). MTTR is the desired time required to perform instance or media recovery on the database. For example, 10 minutes may be set as the goal for media recovery from a disk failure. A variety of factors influence MTTR for media recovery, including the speed of detection, the type of method used to perform media recovery and the size of the database. MTTR of a system should be very low. Under Oracle RDBMS, Recovery Manager (RMAN) provides a good solution toward meeting this business requirement. Features such as block level recovery and backup options such as cumulative, incremental, and full provide a good amount of flexibility to RMAN backup. Using the flashback query option, Oracle provides methods for users to query the database as of a specific time in the past.
System recoverability means that if the system crashes due to reasons such as power surge, power failure, network failure, CPU failure, etc., it should be back up and running in a certain amount of time as defined by the business requirements. In the case of Oracle there is instance recovery, i.e., when a node or instance crashes, recoverability is the mean time required for the instance to perform its recovery operations and be available for users to access. Oracle has made marked progress in this arena of recoverability, with the two-phase recovery process introduced in Oracle 9i.
1.2.4 Securability
The days when the applications were used in a small finite user community are gone. Under the client server model, the applications were used by a small named set of users and these users were identifiable as they belonged to the same organization.
The Internet-based systems have databases that are accessible from all over the world. Consequently, security of data has become of utmost importance and a high-level requirement. Data is vital to a business and should be protected from hackers. Similarly, the dotcom boom has introduced a new level to sharing applications, through the application service providers (ASP) that allow many organizations and users within these organizations to access data from a common database. Data in this situation should be protected between organizations. That is, data that belongs to one organization should not be visible to others. Oracle has various levels of security available to protect data from outside hackers. Oracle's advanced security option provides encryption of data via the network. Another feature is the private virtual database option, where security could be implemented at the row and column level of tables.
1.2.5 Auditability
Auditability of data refers to the ability to retrieve sufficient information with respect to the creation of data, such as who created the data, why the data was created, who modified the data, when it was modified, etc. This requirement is important and has been in existence since computers were put into use for commercial operations. Organizations are required to maintain their financial related information for many years to meet the legal requirements. Basically, there should be a way to reconstruct a transaction when required. From the database perspective, the system, or tool, should be auditable to track changes that take place against the metadata. Oracle provides various auditing capabilities such as the regular auditing options available to the data block address (DBA) and through some of the new features such as LogMiner and flashback queries. Using the LogMiner feature, the DBA could go back in history (depending on how long the redo logs are retained) to retrieve, track (audit), or roll back changes. In Oracle 9i, Oracle has introduced another feature called undo management. Undo management is used in place of rollback segments and will provide abilities to go back in time and examine operations against the database.
1.2.6 Manageability and maintainability
Manageability is a broad area with many aspects. Systems being developed should be easily maintainable and manageable from every tier of the enterprise system. While it may be common to assume that maintainability and manageability is the same thing, the two terms are, in fact, different. Maintainability refers to the everyday continuance or protection of a system, such as the implementation of system and functionality level changes to the system. Manageability refers to the monitoring, tuning, and organization of the system.
Manageability of business requirements entail that the application tier, network tier, database tier, etc., should be easily tunable. From the application tier perspective, there should be considerable options available to manage and monitor the health of the systems. When the business application is developed, the application should provide options to view and tune various thresholds that would help tune application performance. Similarly, development platforms selected should support tools and features that help support these requirements. The tools or methods used should offer visibility into some of the problems and internal operations of the operating system, the layered products, and the infrastructure such as the database, providing a means to understanding the issues and problems and a method for fixing them. For example, Oracle's wait interface (V$SYSTEM_EVENT, V$SESSION_EVENT, V$SESSION_WAIT, and other tools) provides visibility to some of the internals of the Oracle database behavior, providing a better opportunity to approach the issue in a scientific manner.
Every system developed is subject to continuous change throughout its life cycle, from the initial inception or implementation, to upgrades of business functionality, to upgrades of technology, etc. Maintainability of the system is the opportunity to make changes to the system. Thus development servers and database servers selected should allow for configuration changes.
1.2.7 Scalability
Scalability is typically defined in one of two ways, either as the ability to mature the system in accordance with growth in business or as the ability of the application, or enterprise system, to accept additional users in accordance with growth in business without rewriting or redesigning systems. Scalability can be vertical or horizontal (linear). When considering the growth of an enterprise system, linear scalability should be the preferred choice of configuration when compared to vertical scalability. Linear scalability can also provide vertical scalability. While vertical scalability supports more users by increasing the capacity of the existing hardware, linear scalability supports more users by increasing the number of hardware systems (nodes) participating in the configuration. From a systems perspective, a hardware clustering provides this. (A cluster is a group of independent hardware systems or nodes that are interconnected to provide a single computer source.) Linear scalability brought about by clustered hardware solutions also provides distribution of user workload among many nodes. Oracle provides a large number of features that support scalability.
Combined with their respective operating systems, hardware clusters provide system level scalability on the database front. Database features, such as Real Application Cluster (RAC), which runs on a clustered operating system, take advantage of clustering. Adding the clustered database configuration to the fusion helps in providing linear scalability at the database tier of the enterprise system.
Under a clustered database configuration such as RAC, as additional users start accessing the system and if there is a resource contention with the existing configuration, additional nodes and additional instances could be added to the system without much difficulty. Oracle Parallel Server (OPS), the predecessor to RAC technology, started with the feature of adding and removing instances as an Oracle solution. By taking advantage of the cluster interconnect technology to transfer and share information, RAC has taken this feature to the next level of scalability.
Another functionality that scalability indirectly provides is throughput for the enterprise system. Scalability helps in higher throughput, especially with linear scalability, when more and more users would connect to the system from more instances providing workload distribution; these instances could all provide the workload that a single instance originally provided, thus providing more data throughput. The system should be able to provide sufficient throughput to meet the demand of users on the system.
1.2.8 Availability
Availability is another important requirement in today's Internet-based business and is often combined with reliability. This combination is due to the fact that, under most circumstances, both reliability and availability are grouped together as one requirement. To illustrate the difference, take the example from the previous section; when a particular item is requested of the system but the user does not get the requested item and instead gets another item, then the system is considered unreliable. However, if the system was not even up to take the request, then the system is said to be not available; consequently, availability becomes the issue. The common ground is in the English terminology; when it is said that if the system is unavailable then it is considered unreliable.
Availability is measured by the system's uptime for a given period of time. Normally, availability is calculated by the number of hours in a year that the system has been up on a continuous basis. The needs of a system could be loosely defined as the crossing point between the system uptime cost and business downtime cost. Uptime increases at a very rapid rate, eventually reaching infinity as it approaches 100% uptime.
An example of this crossover is shown in Figure 1.2. In this example, the best point, according to the given data, would be 99.5% availability, with a lower availability being too costly in terms of business down- time and higher availability in terms of wasted hardware/software resources.
When defining availability requirements, it is important to differentiate between needs at critical times and the needs for other periods. This should be done on separate charts by factoring the cost factor of the operation. Figure 1.2 illustrates this. When the availability requirement is lower, it becomes too costly in business downtime. When higher availability could mean smaller downtime and a smaller loss of business and when downtime is translated into lost revenue, there could be a relationship drawn with wasted hardware/software resources.
Linearly clustered configuration not only provides support for increased workload but also provides distribution of workload amongst many nodes in the cluster. Linear scalability, with appropriate application architecture and products, provides availability of the enterprise systems. If designed and constructed well, the system could provide continuous availability of the application.
Availability is also measured by the amount of time the system has been up and is available for operation. In defining availability of the system, the word ''system'' does not apply to just the database tier or the application tier, but to the complete enterprise system. This implies that every piece of equipment, including networks, servers, application controllers, disk subsystems, etc., should be considered for availability. Making all tiers of the enterprise system available also means that each tier should provide redundant hardware, helping to provide continuous service when one of the components fails.
Providing this type of availability is based on the actual requirement in place. If the requirement is 99.99% uptime in a 24 hour schedule, 365 days of the year, redundant architecture will become a necessity. However, if some amount of downtime is allowed and does not affect the entire business, then all of this redundancy would possibly not be required. Consequently, availability is measured by the amount of downtime that is allowed per year.
Taking a different perspective on the data represented in Figure 1.2, Table 1.1 and Figure 1.3 offer an analysis of the availability and the corresponding hours of downtime allowed. Each percentage value indicates that the system can only be down for a certain number of hours.
Availability Requirement | Expected Downtime per Year |
---|---|
99.995% | 0.5 hours |
99.97% | 2.5 hours |
99.8% | 17.5 hours |
99.5% | 43.2 hours or 1.8 days |
99% | 88.8 hours or 3.7 days |
98% | 175.2 hours or 7.3 days |
96% | 350.4 hours or 14.6 days |
6 * 16 | All day Sunday and 8 hours/night |
Table 1.1 provides the expected downtime per year for the various levels of availability requirements. The table illustrates the fact that the cost of availability rises substantially with each fraction increase in the availability requirement. This table is graphically represented in Figure 1.3, which indicates that a 99.97% availability requirement means that the system could only be down for about 2.5 hours in a year. This means 2.5 hours of downtime for any part of the enterprise system including the database tier. In 2.5 hours on the database tier nothing significant like restores, database recovery, upgrades, etc., is possible, however strong or foolproof the architecture and the supporting infrastructures are. The reality of the situation is that every system is prone to failures, and 2.5 hours per year of downtime is probably not sufficient.
The main factor to consider in determining availability is to keep the mean time between failures (MTBF) high. (MTBF is the average time (usually expressed in hours) that a component works without failure. It is calculated by dividing the total number of failures into the total number of operating hours observed. The term can also mean the length of time a user can reasonably expect a device or system to work before a failure occurs.) Keeping the MTBF high to meet this 99.99% availability in a normal configuration (e.g., a single instance Oracle configuration) is difficult in a database tier because every database including Oracle is prone to failure.
Modern business requirements stipulate that systems developed and implemented should be capable of providing continuous availability. Applications developed are prone to failures. The applications cannot provide the requirement of continuous availability unless the underlying infrastructure systems support this requirement. This adds to the requirement that the products, tools, utilities, and the infrastructure selected should also be capable of providing this requirement. The percentage of availability, the right technology, and the actual solution is determined by preparing a cost-benefit analysis.
RAC provides functionality that supports a close to 99.99% availability requirement by providing multiple instances to share a common database. RAC is a two or more node configuration where each instance communicates to a common shared single copy of the physical database. From either node, users could access the database to retrieve information. With some of the database features, such as Transparent Application Failover (TAF),[1] users could be migrated to the available node transparently as if no such failure had happened.
1.2.9 Response
When compared to the responsiveness of the system, all other business requirements could be deemed less important. Response time from a user's perspective is the time taken for the system to respond to a request. From the system's perspective, response time would be the service time plus the wait time, i.e., the time taken by the system to gather the information requested by the user and the amount of wait time experienced by the system due to other hurdles encountered while gathering the requested information before providing the response to the user. The goal of the application developed should be to meet this important goal of response, because this specific requirement is critical to provide user satisfaction. It is user satisfaction that promotes loyalty of customers to a business and0020attracts new ones.
The service time depends on the ability of the application to process the information efficiently, which includes the efficiency of the SQL queries (if any) to retrieve data as well as the efficiency of the database tier to respond to these requests.
The wait time is affected by external factors like network latency, disk latency, etc., and is controlled and/or improved by systematically/ scientifically approaching the various factors that cause these waits. Oracle has evolved and matured over the years, providing details at the database internals level, and will help to scientifically approach the issues and resolve any waits to improve the response time of the system.
[1]TAF is discussed in Chapter 10 (Availability and Scalability).
| < Day Day Up > |
|