How Clustering Works in Quartz
Each node in a Quartz cluster is a separate Quartz application that is managed independently of the other nodes. This means that you must start and stop each node individually. Unlike clustering in many application servers, the separate Quartz nodes do not communicate with one another or with an administration node. (Future versions of Quartz will be designed so that nodes communicate with one another directly rather than through the database.) Instead, the Quartz applications are made aware of one another through the database tables.
Figure 11.1 shows that each node communicates directly with the database and has no knowledge of others outside the database.
Figure 11.1. Each node in a Quartz cluster is aware of the other instances only via the database.
Quartz Scheduler on Startup in a Cluster
The Quartz Scheduler itself is not cluster-aware, but the JDBC JobStore configured for the Scheduler is. When the Quartz Scheduler is started, it calls the schedulerStarted() method on the JobStore, which, as the name implies, tells the JobStore that the Scheduler has been started. The schedulerStarted() method is implemented in the JobStoreSupport class.
The JobStoreSupport class uses a property setting from the quartz.properties file (discussed shortly) to determine whether the Scheduler instance is participating in a cluster. If a cluster is configured, a new instance of the class ClusterManager is created, initialized, and started. The ClusterManager is an inner class within the JobStoreSupport class. The ClusterManager class, which extends java.lang.Thread, runs periodically and performs a check-in function for the Scheduler instance. When the clusterCheckin() method is called, the JobStoreSupport updates the database table SCHEDULER_STATE for the Scheduler instance. The Scheduler also checks to see if any of the other cluster nodes have failed. The check-in occurs periodically based on a configuration property (discussed shortly).
Detecting Failed Scheduler Nodes
When a Scheduler instance performs the check-in routine, it looks to see if there are other Scheduler instances that didn't check in when they were supposed to. It does this by inspecting the SCHEDULER_STATE table and looking for schedulers that have a value in the LAST_CHECK_TIME column that is older than the property org.quartz.jobStore.clusterCheckinInterval (discussed in the next section). If one or more nodes haven't checked in, the running Scheduler assumes that the other instance(s) have failed.
Recovering Jobs from Failed Instances
When a Scheduler instance fails while it's executing a job, it's possible to get the job re-executed by another, working Scheduler. For this to happen, the job's recoverable property, configured in the JobDetail object, must be set to true.
If the recoverable property is set to false (the default), when a Scheduler fails while running a job, it won't be re-executed; instead, it will be fired by a different Scheduler instance upon the trigger's fire time, if any. How quickly a failed Scheduler instance is detected depends on the check-in interval of each Scheduler. This is discussed in the next section.
Configuring Quartz to Use Clustering
|