Node Manager
WebLogic provides a standalone Java tool called the Node Manager, which is responsible for managing the availability of all Managed Servers running on a machine. It runs as a dedicated process on a machine, either as a daemon on a Unix machine or as a service on the Windows platform. It provides a way to automatically restart Managed Servers in the case of failure, and even handles servers that are in a "failed" state. A Node Manager also lets the Administration Server remotely start, kill, and monitor Managed Server instances. A single Node Manager process should run on every machine that hosts Managed Servers. When the Node Manager boots a server, it creates a separate process for that server, just as if you had run the startManagedWebLogic script on that machine.
|
Figure 13-5 illustrates the role of Node Managers in a domain.
Figure 13-5. Node Managers act as agents to the Administration Server
In order to control the life cycle of a Managed Server, using either the Administration Server or the weblogic.Admin tool, you must start the servers under the control of a Node Manager. For instance, if you restart a Managed Server remotely using the Administration Console, the Administration Server contacts the appropriate Node Manager to perform the task. Even though you have no explicit, direct control over the Node Managers, they act as agents for the Administration Server.
Even though the Administration Server works closely with the Node Managers running on the different machines that host the Managed Servers in the domain, a Node Manager is still outside the scope of a WebLogic domain. The same Node Manager monitors all Managed Servers on a machine, regardless of the domains to which they belong.
|
It is important to note that when Node Managers are used, each server is started up through the Node Manager running on the machine, using either the Administration Console or a JMX application, and not through your own startup scripts.
Finally, Node Managers use SSL in their communication. The Administration Server talks to the Node Managers using (short-lived) two-way SSL-protected messages, ensuring that only authorized Administration Servers can control the Node Managers. In addition, the Node Manager itself uses an SSL connection with each of the Managed Servers under its control. This connection remains alive for the entire duration that a Managed Server is up, and is used to monitor the server.
Configuring the Node Managers can be a little tricky, but once you have set them up, you can leave them humming away by themselves without any further intervention. The following sections look at how to configure a machine to use a Node Manager, how to configure a server to use a Node Manager, and where to locate the Node Manager logs in case things go wrong. The configuration is a three-part process:
- Configure the Node Manager on each physical machine that hosts the Managed Servers of the domain.
- Configure each machine in a WebLogic Domain to use a Node Manager.
- Assign each Managed Server to a machine, and configure the interaction between the Managed Server and the Node Manager assigned to the machine.
13.7.1 Configuring the Node Manager for a Physical Machine
Every physical machine should have a single Node Manager instance running. Security is the most important aspect of the configuration. WebLogic tries hard to ensure that only authorized users can access the Node Manager otherwise, it could be used to tamper with your servers. For this reason, WebLogic secures the Node Managers in two ways:
- You can instruct the Node Manager to accept connections from certain trusted hosts only.
- The Node Manager and the Administration Server communicate with each other through SSL. So, both the Administration Server and the Node Managers must be configured to use SSL.
13.7.1.1 Trusted hosts
In order to configure a list of trusted hosts for a Node Manager, you must create a text file with the addresses of all Administration Servers that are allowed to contact the Node Manager. Each line specifies either the IP address or the DNS host name of an Administration Server. By default, a Node Manager uses the nodemanager.hosts file located under the WL_HOMEcommon odemanager folder.[5] For example, you could have the following entries in the file:
[5] In WebLogic 7.0, it's in the config subdirectory of this folder.
wladmin.oreilly.com 10.0.10.10
The default entries allow access from the local host only. You can create a different trusted hosts file, and modify the Node Manager's startup script so that it specifies the location of this file:
java -Dbea.home=%BEA_HOME% -Dweblogic.nodemanager.javaHome=%JAVA_HOME% -Dweblogic.nodemanager.trustedHosts=nodemanager.myhosts ... -Dweblogic.ListenAddress=10.0.10.10 weblogic.nodemanager.NodeManager
If you specify DNS names, you also must enable a reverse DNS lookup for the Node Manager (by default, it is not enabled). To do this, simply specify an additional system property in the startup script:
-Dweblogic.nodemanager.reverseDnsEnabled=true
The Node Manager then will accept connections only from an Administration Server running on one of the addresses specified in the trusted hosts file.
13.7.1.2 SSL configuration
Because all communication between the Administration Server and the Node Manager uses SSL, both the server and Node Manager must have SSL configured. Refer to Chapter 16 for the necessary SSL background. A Node Manager uses the same public key infrastructure as WebLogic Server itself, and the default installation uses the DemoIdentity.jks and DemoTrust.jks stores. So, if you just want to get everything going, you can use the default configuration and ignore the rest of this setup.
The best way to modify the default setup is to edit the nodemanager.properties file in the WL_HOMEcommon odemanager directory. Alternatively, you can specify any of the system properties from the command line when starting up the Node Manager. The default nodemanager.properties file also provides the syntax for most properties. For example, depending on which keystores you wish to use, the KeyStores property can take any of the following values:
#Possible values for the Keystores property #KeyStores = [DemoIdentityAndDemoTrust| CustomIdentityAndJavaStandardTrust|CustomIdentityAndCustomTrust]
Here is an example property file:
KeyStores=CustomIdentityAndCustomTrust CustomIdentityKeyStoreFileName=y:mystoresmyIdentityStore.jks CustomIdentityKeyStorePassPhrase=mykeystorepass CustomIdentityKeyStoreType=JKS CustomIdentityAlias=myalias CustomIdentityPrivateKeyPassPhrase=mypassword CustomTrustKeyStoreFileName=y:serverlibDemoTrust.jks #These are commented out as the default trust store doesn't need them #CustomTrustKeyStorePassPhrase=mypassphrase #CustomTrustKeyStoreType=JKS #CustomTrustKeyPassPhrase=mykeypass
This file sets up a custom identity and trust store for the Node Manager, which is typical of most production deployments. It references the demonstration trust store and an example identity store that is described in Chapter 16. After restarting the Node Manager, all of the pass phrases will be encrypted.
13.7.1.3 SSL for WebLogic 7.0
SSL configuration for a Node Manager in WebLogic 7.0 is slightly different. You can either use your own key and certificate files, or in a test setup, use the sample key and certificate files that are supplied with WebLogic's installation. The demonstration SSL certificate and key files are located in the WL_HOMEcommon odemanagerconfig directory, as well as in the root directory of any domain created using the Configuration Wizard.
Once you have the required SSL certificate and key files, you need only to specify additional system properties in the Node Manager's startup script:
weblogic.nodemanager.keyFile
This property identifies the path to the key file.
weblogic.nodemanager.keyPassword
This property specifies the password to use if the key file is encrypted.
weblogic.nodemanager.certificateFile
This property identifies the path to the certificate file.
weblogic.security.SSL.trustedCAKeyStore
This property identifies the path to the keystore that holds the trusted CA certificates.
weblogic.nodemanager.sslHostNameVerificationEnabled
This property causes the Node Manager to perform hostname verification of the Administration Server that is communicating with it.
Chapter 16 provides a more detailed explanation of SSL configuration for WebLogic.
13.7.1.4 Additional configuration properties
Table 13-2 provides a list of additional system properties that you may need to specify. For instance, you may wish to modify the listen address for the Node Manager. All of these properties can simply be placed in the nodemanager.properties file. In WebLogic 7.0, you must specify them from the command line.
Property name |
Description |
Default |
---|---|---|
JavaHome |
This property specifies the Java home that should be used to start the managed servers. Otherwise, it uses the Java home defined in the Remote Start tab for the server. If that is not defined, it uses the Java home used to start the Node Manager itself. |
None |
WeblogicHome |
This property sets the WebLogic home directory. You also can specify it on a per-server basis on a server's Remote Start tab. |
None |
ListenAddress |
This property sets the address on which the Node Manager should listen. |
All IP addresses assigned to the machine |
ListenPort |
This property determines the port number on which the node manager should listen. |
5555 |
NativeVersionEnabled |
This property defines whether the Node Manager will run in a native mode. |
true |
ReverseDnsEnabled |
This property defines whether reverse DNS may be used to resolve addresses in the trusted host file. |
false |
SavedLogsDirectory |
This property determines where the log files will be written. |
./NodeManagerLogs |
TrustedHosts |
This property determines the file containing the list of all trusted hosts. |
./nodemanager.hosts |
ScavengerDelaySeconds |
This property is used if a server is started using the Node Manager. It will wait for this number of seconds before expecting a response from the server. Otherwise, it considers the task to have failed. |
60 seconds |
StartTemplate |
This property is used by Unix systems to specify the path to a script file that will be used to start Managed Servers. |
./nodemanager.sh |
If you change any of these properties, you must stop and restart the Node Manager for the changes to take effect.
13.7.1.5 Starting a Node Manager
In a production environment, it is very important that the Node Manager is running at all times. Without the Node Manager, there is no way to automatically start, restart, or kill Managed Servers. The simplest way to accomplish this is to ensure that it runs as a Unix daemon or Windows Service. The default installation process provides you with an option to install the Node Manager in this way. For the Windows platform, you can use two scripts located in the WL_HOMEserverin directory to install and uninstall the service:
installNodeMgrSvc.cmd
This script installs the Node Manager as a Windows Service.
uninstallNodeMgrSvc.cmd
This script stops and uninstalls the Node Manager service.
Make sure that you first modify these scripts to include the system properties we described earlier. The WebLogic documentation provides additional information on more advanced configurations of the Node Manager and Windows Services.
In addition, you can start the Node Manager using the startNodeManager script, which is also located in the WL_HOMEserverin directory. To check on the status of a Node Manager, select a machine node from the left pane of the Administration Console and then choose the Monitoring/Node Manager Status tab.
13.7.2 Configuring a Machine to Use a Node Manager
After installing, configuring, and running a Node Manager on each physical machine, you must configure the machines for the domain and assign server instances to these machines. This information tells WebLogic which Managed Servers run on which physical machines, and hence which servers are under the control of the Node Manager on that machine. This is a two-part process. First you have to define the machines and configure them to use the Node Manager, and then you have to assign Managed Servers to the machines.
Using the Administration Console, select the Machines node in the left pane to view all of the machines in the domain. Each machine entry should encapsulate the settings for a physical machine. Use the righthand pane to create a new machine or modify an existing machine entry. For each machine, select the Node Manager tab and enter the listen address and port used by the Node Manager on that machine.
Finally, you need to assign the machine to the Managed Servers. Use the Servers tab to select those servers that run on the chosen machine. You also can assign a machine to a server from the Configuration/General tab of that server. This assignment is used in other situations as well. For instance, in a clustered environment WebLogic will try to replicate session data onto a server that runs on separate hardware. It does this by treating the different machines in the domain as physically different pieces of hardware. The servers assigned to a machine then determine which servers in the cluster are collocated (and which aren't).
13.7.3 Configuring the Node Manager for a Managed Server
The final task is to configure each Managed Server so that the Node Manager can control it. Because the Node Manager does not rely on external scripts to remotely start and kill a Managed Server, the information found in the startup scripts needs to be configured for each server using the Administration Console. The information is then saved as part of the domain configuration. Select a Managed Server from the left pane, and then choose the Configuration/Remote Start tab to specify the following parameters:
- The home directory of your JDK
- The home directory of your BEA installation
- The root directory of the domain
- The classpath that should be used to start the server
- Any additional JVM arguments to use
- The security policy file to use
- The username and the password of a WebLogic user with administrative privileges
All of these settings mirror the environment variables used in the startWebLogic scripts; we already saw that some of them can take on default values assigned to the Node Manager. Note that the directory paths used in the preceding settings must be valid on the machine that hosts the Managed Server, and not the Administration Server. This data is sent to the Node Manager on that machine, which then starts up the Managed Server in a separate process.
13.7.4 Configuring Node Manager Behavior
By default, the Node Manager will automatically restart servers that fail, or when it cannot determine the server's state. Once a Managed Server has failed, it will try to restart it no more than twice within the next hour.
Table 13-3 lists the configuration settings available for monitoring the health of a Managed Server. You can modify these settings from the Administration Console. Select a Managed Server from the left pane, then select the Configuration/Health Monitoring tab.
Setting |
Description |
Default |
---|---|---|
Auto Restart |
If you disable this option, the Node Manager will not attempt to restart a failed server. |
true |
Auto Kill if Failed |
If this is set to true, the Node Manager may kill the server process if the server's health is in the failed state, or when it cannot query the server for its health state. |
false |
Restart Interval ; Max Restarts within Interval |
The Node Manager will try to restart the server only within the specified restart interval period. If this time period is exceeded, no further attempts will be made. During the time period, the Node Manager will try no more than Max Restarts to restart the server. By default, the Node Manager makes no more than two attempts within an hour to restart a failed server. |
3600; 2 |
Health Check Interval |
This setting determines the interval (in seconds) at which the Node Manager polls the server for its health state. |
180 |
Health Check Timeout |
This setting determines the number of seconds to wait for a response from a health check. By default, if the timeout is reached, the Node Manager will kill the server process and attempt to restart the server. |
60 |
Restart Delay Seconds |
This setting determines the number of seconds that the Node Manager will wait before trying to restart the server after killing it. This may be needed on some systems where killing the process does not immediately release all resources before the restart. |
0 |
13.7.5 Default Operation of the Node Manager
Once a Node Manager has been installed and configured on a machine and the Managed Servers have been configured, the Node Manager is finally ready for use. You interact with the Node Manager indirectly using the Administration Console or the weblogic.Admin tool. To use the Administration Console, select a Managed Server from the left pane and then choose the Control tab. You then will be able to start, suspend, resume, and shut down a server. We discuss the various shutdown options and the use of the weblogic.Admin tool in a later section.
13.7.5.1 Starting managed servers
Imagine that you try to start a Managed Server remotely. Let's say that you want to start ServerA in Figure 13-5. The Administration Server will receive the instruction and forward it to the Node Manager on the machine that is configured to host ServerA i.e., MachineB. The Node Manager running on MachineB then will start the server. By default, if the Managed Server doesn't respond within 60 seconds (the Scavenger Delay), the Node Manager will set the server's state to UNKNOWN. If the server does start after this delay, the Node Manager will change this state to RUNNING.
13.7.5.2 Suspending and stopping managed servers
Requests to suspend or stop managed servers don't proceed quite in the same fashion. The commands are issued directly to the Managed Servers from the Administration Server. Only if the Administration Server cannot reach a Managed Server does it dispatch the command to the appropriate Node Manager, which then forwards it to the Managed Server. Likewise, if a Managed Server does not respond to a shutdown request, the Node Manager can shut down the process forcibly (it records the process ID for this purpose).
13.7.5.3 Health monitoring
By default, the Node Manager checks the health status of each Managed Server every 180 seconds. If a Managed Server is in the failed state and its Auto Kill If Failed attribute is set to true, the Node Manager will kill and restart the process. By default, this attribute is set to false. The same occurs if a server fails to respond to three consecutive health queries.
By default, the Node Manager will not restart a Managed Server more than twice within an hour. The frequency of restarts is governed by the Restart Interval and Max Restarts within Interval attributes.
It is worth stressing the following points on the use of a Node Manager:
- Managed Servers can be started, monitored, and shut down by a Node Manager only if it started the server. If you start a Managed Server manually, the Node Manager will not interact with the server at all.
- If a Node Manager itself fails, this won't affect servers running on that machine. However, you won't be able to monitor the heath of the servers and automatically restart if a server is in poor health. For this reason, you should run the Node Manager as a service or daemon on your operating system.
- If you haven't enabled the Managed Server Independence mode for a Managed Server, you cannot restart a Managed Server without the Administration Server, even if the Node Manager is running on that machine already. For this reason, you should take whatever measures are necessary to ensure that the Administration Server is always available, and that it can be restarted if it ever fails.
13.7.6 Node Manager Logs
Two sets of logs are associated with the Node Manager. Both sets are useful when you need to debug any problems with the Node Manager or when you need to set up a more comprehensive monitoring environment. A subset of the logs is available from the Administration Console. Choose a Managed Server from the left pane, and then select the Control/Remote Start Output tab.[6]
[6] In WebLogic 7.0, it's the Monitoring/Process Output tab.
Three sets of logs are maintained for each Node Manager:
Node Manager client logs
The Administration Server maintains Node Manager log files in the NodeManagerClientLogs directory of the domain. These logs hold information about the commands directed to the Node Manager via the Administration Console (or the weblogic.Admin tool).
Node Manager logs
The Node Manager itself generates log messages when it starts up or shuts down. These logs are located in the WL_HOME/common/nodemanager/NodeManagerLogs/NodeManagerInternal directory on the particular machine. Use these log files to diagnose whether a Node Manager is not starting up properly. These logs essentially correspond to the View Node Manager Output option in the Administration Console.
Managed Server logs
The Node Manager maintains a subdirectory under the NodeManagerLogs directory, for each Managed Server that it controls. These log files hold the full output of the server that was started. These logs correspond to the View Server Output option in the Administration Console.
You may need to clean these directories periodically as the number and size of log files continue to grow.
13.7.6.1 Node Manager client logs
The client logs record all actions executed by a Node Manager on behalf of a JMX-based client, such as the Administration Console or the weblogic.Admin tool. A separate directory created within the domain log directory for each server within the domain. All of the recorded actions are timestamped and usually include a notification of the success or failure of the action. Here's a typical example of the client logs:
<05-Jul-2003 14:07:05 BST> <10-Jul-2003 13:12:38 BST> <10-Jul-2003 13:14:50 BST> <_ _COMMAND_DONE_ _>
These logs contain only actions that were submitted through the Administration Console or any JMX-based client. For instance, if the Node Manager automatically restarts a failed server, this action is not recorded in the logs. Instead, it will be recorded in the machine logs for the Node Manager in charge of that server.
13.7.6.2 Managed Server logs
The server logs also are organized into subfolders, one for each server running on the machine. Each directory contains the following files:
servername_pid
This file contains, in text, the process ID of the Managed Server. If a Managed Server on a machine is using all of the CPU for some reason, you can trace the error to the actual server by grepping through these files. The Node Manager in turn uses this data to kill the process.
servername_output.log
This file records startup messages saved by the Node Manager when it starts a server.
servername_error.log
This file records any error messages that are generated when the Node Manager starts a server.
config.xml
This file contains any configuration information passed to the Node Manager by the Administration Server and can be safely ignored.
Except for the configuration file, all of the Managed Server log files are renamed by appending _prev to the filename whenever a server is restarted.