High-Speed Interconnects

In order to get the best response time and to increase throughput, the most important physical aspect is the network interconnect.

NDB currently supports three different connection methods, also called transports: TCP/IP, shared memory, and SCI. It is possible to mix different transports within the same cluster, but it is recommended that all the data nodes use the same transport among themselves, if possible. The default if you don't specify is to use TCP/IP for all the nodes.

TCP/IP

TCP/IP is by far the most common connection protocol used in NDB. This is due to the fact that it is the default and also is normally preconfigured on most servers. The management node always makes use of TCP/IP. It cannot use shared memory or SCI. Normally, the fact that it can't use a high-speed interconnect should not be an issue because there isn't very much data being transferred between the management node and others.

TCP/IP is available over many different mediums. For a base cluster in MySQL Cluster, Gigabit Ethernet would be the minimum recommended connection. It is possible to run a cluster on less than that, such as a 100MB network, but doing so definitely affects performance.

Generally, you want your networking hardware to be high quality. It is possible to get network cards that can do much of the TCP/IP implementation. This helps to offload CPU usage from your nodes, but it will generally lower network latency slightly. Each little bit that you can save on network latency helps.

If you are looking for even higher performance over TCP/IP, it is possible to use some even higher network or clustering interconnects. For example, there is now 10GB Ethernet, which may increase performance over Gigabit Ethernet. There are also special clustering interconnects, such as Myrinet, that can make use of the TCP/IP transport as well. Myrinet is a commonly used network transport in clusters. Because it can use TCP/IP, it can be used with MySQL Cluster as well.

When examining the more expensive options, we highly recommend that you acquire test hardware if possible. Many of these interconnects cost many times what commodity hardware, such as Gigabit Ethernet, costs. In some applications, interconnects can make a large performance difference, but in others, they may not make a very great difference. If the performance isn't much better, it might be wiser to spend more of your budget on something else, such as more nodes.

Shared Memory

MySQL Cluster can make use of shared memory connections. Shared memory connections are also referred to as the SHM TRansport. This type of connection works only when two nodes reside on the same physical machine. Most commonly, this is between a MySQL server and a data node that reside locally, but it can be used between data nodes as well.

Generally, shared memory connections are faster than TCP/IP local connections. Shared memory connections require more CPU, however, so they are not always faster due to CPU bottlenecks. Whether a CPU bottleneck decreases performance is highly application specific. We recommend that you do some benchmark testing between shared memory and TCP/IP local connections to see which is better for your particular application.

In order to use the NDB shared memory transport, you need to ensure that the version you are using has the transport compiled in. Unfortunately, not all the operating systems that support MySQL Cluster contain the necessary pieces to support shared memory connections. If you are compiling yourself, you need to compile MySQL with the option --with-ndb-shm.

When you are sure that your platform contains the transport, you need to configure MySQL Cluster to use it. There are two different ways to configure shared memory connections. The first option is designed to allow for more automatic configuration of shared memory. The second requires manual configuration but is a bit more flexible.

The first option is controlled by a startup option for the MySQL server called ndb-shm. When you start mysqld with this option, it causes mysqld to automatically attempt to use shared memory for connections, if possible. Obviously, this works only when connecting to local data nodes, but you can run with this option regardless of whether there are any local data nodes.

The second option is to set up shared memory by putting it in the cluster configuration file (normally config.ini). For each shared memory connection you want, you need to add a [SHM] group. The required settings are the two node IDs involved and a unique integer identifier. Here is an example:

[SHM] NodeId1=3 NodeId2=4 ShmKey=324

NodeId1 and NodeId2 refer to the two nodes you want to communicate over shared memory. You need a separate [SHM] section for each set of two nodes. For example, if you have three local data nodes, you need to define three [SHM] sections (that is, 12, 23, and 13).

You may want to define some optional settings as well. The most common setting is called ShmSize. It designates the size of the shared memory segment to use for communication. The default is 1MB, which is good for most applications. However, if you have a very heavily used cluster, it could make sense to increase it a bit, such as to 2MB or 4MB.

SCI

Scalable Coherent Interface (SCI) is the final transport that MySQL Cluster natively supports. SCI is a cluster interconnect that is used commonly for all types of clusters (not only MySQL Cluster). The vendor that provides this hardware is called Dolphin Interconnect Solutions, Inc. (www.dolphinics.com).

According to MySQL AB's testing, SCI normally gives almost a 50% reduction in response time for network-bound queries. Of course, this depends on the queries in question, but it generally offers increased performance.

There are two ways you can use this interconnect. The first is through the previously mentioned TCP/IP transport, which involves an interface called SCI Sockets. This interface is distributed by Dolphin Interconnect Solutions and is available for Linux only. This is the preferred method for use between the MySQL server and data nodes. From a MySQL Cluster perspective, the network is just set up to use normal TCP/IP, but the SCI Sockets layer translates the TCP/IP into communication over SCI. For configuration information, see the SCI Sockets website, at www.dolphinics.com.

The second way to use SCI is through the native NDB transport designed for it, which uses the native SCI API. To make use of this, you need to compile MySQL Cluster with support for it, using the option --with-ndb-sci. This is the normal method to use between data nodes as it is normally faster than using SCI Sockets.

In order to use the native SCI transport, you need to set up your cluster configuration file (config.ini) with the connection information. For each SCI connection, you need to create a section called [SCI]. The required options to use under this option are described in the following sections.

NodeId1 and NodeID2

The two options NodeId1 and NodeID2 define which nodes are going to be involved in this connection. You need to define a new group for each set of nodes. For example, if you have four nodes that all use SCI, you need to define 12, 13, 14, 23, 24, and 34. Imagine if you have a large number of data nodes (such as eight or more): You might have quite a few sections. The actual number of sections required is the combination of the number of data nodes, taken two at a time (nC2).

Host1SciId0, Host1SciId0, Host2Sci0, and Host2Sci1

The values Host1SciId0, Host1SciId0, Host2Sci0, and Host2Sci1 designate which SCI devices to use. Each node needs to have at least one device defined to tell it how to communicate. You can, however, define two devices for each node. SCI supports automatic failover if you have multiple SCI communication devices installed and configured to use. The actual ID number is set on the SCI card.

SharedBufferSize

The optional parameter SharedBufferSize specifies how much memory to devote on each end for using SCI. The default is 1MB, which should be good for most applications. Lowering this parameter can potentially lead to crashes or performance issues. In some cases, you might get better performance by increasing this parameter to slightly larger sizes.

The following is an example of a single entry (remember that you will quite often need more than just one):

[SCI] NodeId1=5 NodeId2=7 Host1SciId0=8 Host2SciId0=16

 

Adding More Transports

It is possible to implement new transports relatively easily due to the separation of the transport layer from the logical protocol layer. Doing so requires writing C++ code. Although most people do not have to do this, it might prove advantageous in some cases. How to actually do it is beyond the scope of this book, but to get started, we would recommend that you do two things. First, you can refer to the source code in the directory called ndb/src/common/transporter, which is part of the MySQL source code distribution, and pay close attention to the class TRansporter in transporter.hpp. Second, if you need assistance, you can ask on cluster@lists.mysql.com or through your support contract, if you have one.

Категории