Inside Microsoft SQL Server 7.0 (Mps)
Replication allows you to automatically distribute copies of data from one server to one or more destination servers at one or more remote locations. Data integrity is a key design point of SQL Server's replication capabilities. The data at replicating sites might be slightly outdated , but it will accurately reflect the true state of the data at a recent point in time and is guaranteed to reflect any changes made soon after they occur (or soon after a reliable network connection is established).
SQL Server 7 supports three modes of replication, which are detailed on the companion CD in a whitepaper entitled "Understanding SQL Server 7.0 Replication":
- Transaction-based replication In this mode of replication, one site is considered the owner, or publisher, of a published article a table or a subset of a table), and other sites subscribe to that article. All changes made to the article must be made at the publisher and then replicated to the subscribers.
- Merge replication Although the publisher defines the articles to be published, any participating sites can make changes. System tables keep track of the changes, which are then propagated to all participants , including the original publisher. You can define which type of conflict resolution to use in the event of conflicting updates made by multiple sites.
- Snapshot replication All data from the publishing site is periodically copied to all the subscribing sites. Individual changes aren't propagated. Formerly, this method was called scheduled table refresh .
Distributed transactions using the two-phase commit protocol guarantee ACID properties, but replication doesn't. Replication isn't strongly consistent (the C in ACID). Instead, replication provides loosely consistent data. With the two-phase commit protocol, a transaction is an all-or-nothing proposition, and the data is assured to be strongly consistent . But inherent in the two-phase commit algorithm is the fact that a failure at any one site makes the entire transaction fail or can keep the transaction in doubt for long periods of time, during which all participants need to hold locks, crippling concurrency.
At first glance, you might think a system should require that updates be made at all sites in real time . In fact, when the costs of two-phase commit are realized (chiefly, the vulnerability that can result from a failure at just one node), the most pragmatic solution might be to make changes in real enough time .
For example, suppose you run a car rental agency with 500 rental counters worldwide, and you maintain a customer profile table containing 500,000 renters who belong to your Gold Card program. You want to store this customer profile locally at all 500 rental counters so that even if a communication failure occurs, the profile will be available wherever a Gold Card member might walk up to do business. Although all sites should have up-to-date records of all customers, it would be disastrous to insist that an update of the Gold Card profile must occur as part of a two-phase transaction for all 500 sites or not at all. With this scenario, because of the realities of worldwide communications, or because at one site a storm might have knocked out the power, you probably wouldn't be able to perform a simple update to the customer profile very often.
Replication is a much better solution in such a case. The master customer profile table would be maintained at your corporate headquarters. Replication publishes this data, and the rental counters then subscribe to this information. When customer data is changed, or when a customer is added or removed, these changes (and only the changes) are propagated to all the subscribing sites. In a well-connected network, the time delay might be just a few seconds. If a particular site is unavailable, no other sites are affected ”they still get their changes. When the unavailable site is back online, the changes are automatically propagated and the subscriber is brought up to date. At any time, a given rental counter might not have exactly the same information as the corporate site ”it might be slightly out of date. The data at the rental counter is consistent with the state of the data at the corporate headquarters at some earlier time; it's not necessarily consistent with the current corporate site data. This is considered loosely consistent, as opposed to the strongly consistent model of two-phase commit in which all sites (or none) immediately reflect the change.
Although a time delay can occur in loosely consistent systems, maintaining transactional consistency is one of the chief design points of SQL Server replication. If multiple updates occur as a single atomic transaction to data being replicated, the entire transaction will also be replicated. At the subscribing site, the transaction will either entirely commit or it will again be replicated until it commits.
With SQL Server, data can be replicated continuously or at specified intervals. It can be replicated in its entirety or as filtered subsets (known as horizontal and vertical partitions ). In addition to replication to other SQL Servers, version 7 can replicate to heterogeneous data sources that have an appropriate ODBC driver or OLE DB provider available. These include Access databases, Oracle databases, and other ODBC data sources. Additionally, SQL Server 7 can replicate from other data sources, using ODBC drivers or OLE DB providers. These sources include Oracle 8, Microsoft Jet version 4, and IBM DB2.
Unlike SQL Server, some products on the market promote replication as an "Update Anywhere, Anytime, Anyway" model. However, this model has inherently unstable behavior if many nodes participate and update activity is moderate to heavy. 1 Updates made at multiple sites will conflict with one another and must be reconciled. SQL Server 7's merge replication does allow multiple site updates, but it's not a true "Update Anywhere, Anytime, Anyway" solution. Refer to the whitepaper on the companion CD for more details on exactly what SQL Server merge replication can do.