Replicating PostgreSQL Data with Slony
Replication is the process of distributing data from database to database. Typically, the databases reside on different physical computers. PostgreSQL has supported a number of replication mechanisms over the years, but Slony is emerging as the preferred replication solution. Slony offers asynchronous, one-way, cascading replication of data from one origin to any number of subscribers.
Asynchronous means that the copy of the data that you find at a subscriber may not be the most current datathe subscriber may be "out of sync" with the origin. In fact, you may find different data at each subscriber. Asynchronous replication implies that a subscriber does not need to be up-and-running all the time. You can take a subscriber offline and you won't affect the originthe subscriber catches up after you bring it back online. Of course, if a subscriber is offline for an extended period of time (and many data changes have occurred at the origin), it can take quite a while for the client to catch up. Bringing a slow subscriber back in sync can also create a heavy load on your network and on the provider database.
One-way replication means that you can only change a replicated table at the origin. Slony copies any changes to the original table to each subscriber node. At each subscriber site, Slony adds a trigger to the replicated table that prevents you from modifying the data. (Replicated tables are modifiable at the origin and read-only at each subscriber.)
A cascading replication mechanism (such as Slony) lets you lighten the load on the origin by cascading changes from subscriber to subscriber. Using a cascading topology means that the origin does not have to service every subscriber. Instead one subscriber can service another. Every node in a replication cluster can act as a provider for one or more subscribers.
To avoid confusion, Slony documentation doesn't talk about master or slave nodes. Every node that participates in a Slony cluster can originate tables and subscribe to other tables. (That is, every node can act as both an origin and a subscriber.) Here's a quick preview of the terminology used by the Slony developers:
- Node: A node is a database that acts as an origin, a subscriber, or both.
- Cluster: A cluster (or, more precisely, a replication cluster) is a collection of nodes that share replication data.
- slon: The program (usually run as a background daemon) that copies replication data and configuration information from one node to another.
- Set: The smallest unit of replication; a set defines a collection of tables and sequences that are replicated from node to node.
- Path: The connection properties (hostname, user id, password, and so on) that describe how a slon daemon connects to another node.
- Event: A message sent from one node to other nodes to indicate that replication data or configuration changes are available on the originating node.
- Origin: The node that holds the master copy of a replication set.
- Subscriber: A node that holds a read-only copy of a replication set that has been copied (either directly or through a chain of other subscribers) from the set's origin.
- Provider: A node that provides replication data to a subscriber. (A provider is often an origin, but a provider may be a subscriber that forwards replication data to another subscriber.)