How Clustering Works in Quartz

Each node in a Quartz cluster is a separate Quartz application that is managed independently of the other nodes. This means that you must start and stop each node individually. Unlike clustering in many application servers, the separate Quartz nodes do not communicate with one another or with an administration node. (Future versions of Quartz will be designed so that nodes communicate with one another directly rather than through the database.) Instead, the Quartz applications are made aware of one another through the database tables.

Quartz Clustering Works Only When Using a JDBC JobStore

Because clustered nodes rely on the database to communicate the state of a Scheduler instance, you can use Quartz clustering only when using a JDBC JobStore. This means that you must be using either the JobStoreTX or the JobStoreCMT for job storage; you can't use RAMJobStore with clustering. A future release most likely will remove this requirement, and nodes will communicate directly with one another through a network protocol, possibly by using JGroups.

Figure 11.1 shows that each node communicates directly with the database and has no knowledge of others outside the database.

Figure 11.1. Each node in a Quartz cluster is aware of the other instances only via the database.

 

Quartz Scheduler on Startup in a Cluster

The Quartz Scheduler itself is not cluster-aware, but the JDBC JobStore configured for the Scheduler is. When the Quartz Scheduler is started, it calls the schedulerStarted() method on the JobStore, which, as the name implies, tells the JobStore that the Scheduler has been started. The schedulerStarted() method is implemented in the JobStoreSupport class.

The JobStoreSupport class uses a property setting from the quartz.properties file (discussed shortly) to determine whether the Scheduler instance is participating in a cluster. If a cluster is configured, a new instance of the class ClusterManager is created, initialized, and started. The ClusterManager is an inner class within the JobStoreSupport class. The ClusterManager class, which extends java.lang.Thread, runs periodically and performs a check-in function for the Scheduler instance. When the clusterCheckin() method is called, the JobStoreSupport updates the database table SCHEDULER_STATE for the Scheduler instance. The Scheduler also checks to see if any of the other cluster nodes have failed. The check-in occurs periodically based on a configuration property (discussed shortly).

Detecting Failed Scheduler Nodes

When a Scheduler instance performs the check-in routine, it looks to see if there are other Scheduler instances that didn't check in when they were supposed to. It does this by inspecting the SCHEDULER_STATE table and looking for schedulers that have a value in the LAST_CHECK_TIME column that is older than the property org.quartz.jobStore.clusterCheckinInterval (discussed in the next section). If one or more nodes haven't checked in, the running Scheduler assumes that the other instance(s) have failed.

Running Nodes on Separate Machines with Unsynchronized Clocks

As you can ascertain by now, if you run nodes on different machines and the clocks are not synchronized, you can get unexpected results. This is because a timestamp is being used to inform other instances of the last time one node checked in. If that node's clock was set for the future, a running Scheduler might never realize that a node has gone down. On the other hand, if a clock on one node is set in the past, a node might assume that the node has gone down and attempt to take over and rerun its jobs. In either case, it's not the behavior that you want. When you're using different machines in a cluster (which is the normal case), be sure to synchronize the clocks. See the section "Quartz Clustering Cookbook," later in this chapter for details on how to do this.

 

Recovering Jobs from Failed Instances

When a Scheduler instance fails while it's executing a job, it's possible to get the job re-executed by another, working Scheduler. For this to happen, the job's recoverable property, configured in the JobDetail object, must be set to true.

If the recoverable property is set to false (the default), when a Scheduler fails while running a job, it won't be re-executed; instead, it will be fired by a different Scheduler instance upon the trigger's fire time, if any. How quickly a failed Scheduler instance is detected depends on the check-in interval of each Scheduler. This is discussed in the next section.

Configuring Quartz to Use Clustering

Категории