Parallel Computing on Heterogeneous Networks (Wiley Series on Parallel and Distributed Computing)

Parallel Computing on Heterogeneous Networks, by Alexey LastovetskyISBN 0-471-22982-2 Copyright 2003 by John Wiley & Sons, Inc.

8.1. MPI AND HETEROGENEOUS NETWORKS OF COMPUTERS

In Section 4.2 we presented MPI as the standard message-passing library used for parallel programming a homogeneous distributed memory architecture. In practice, MPI is very often used for parallel programming NoCs as well. There are a number of reasons behind the popularity of MPI among the programmers developing parallel applications for NoCs:

  1. There are two free high-quality implementations of MPI, LAM MPI and MPICH, that support cross-platform MPI applications. For example, if an NoC consists of computers running different clones of Unix such as Linux, Solaris, HP/UX, and IRIX, then having installed such an MPI implementation on each computer of the network, the users can develop and execute MPI program running across the computers of the heterogeneous NoC.

  2. The standard MPI encapsulates the problem of different data representations in processors of different architectures. The MPI can properly convert data communicated between processors of different architectures. On the sender side, MPI will convert the data to a machine-independent form. The data will be transferred to the receiver in this machine-independent form, where the data will be converted to the receiver’s machine-specific form.

  3. While very well designed and easy to understand, the MPI communcation model is of a low enough level to write efficient code for any NoC.

However, the standard MPI does not address additional challenges posed by heterogeneous NoCs. We analyzed some of the challenges in Chapter 5. We repeat these challenges here:

These three main challenges posed by NoCs are not addressed by a standard MPI library. First, the standard MPI library does not employ multiple network protocols between different pairs of processors for efficient communication in the same MPI application. The only exception is the use of shared memory and TCP/IP in the MPICH implementation of MPI. Namely, if two processes of the MPI program run on the same SMP computer, they will communicate by shared memory. If the processes run on different computers, they will communicate by the TCP/IP protocol. There has been some effort made to address this challenge, such as the Nexus research implementation of MPI.

Second, the standard MPI library does not allow programmers to write fault-tolerant parallel applications for NoCs. In Section 5.3.2 we outlined the research efforts made to add the feature of fault tolerance to MPI applications. The most recent research result is FT-MPI, which is an explicit fault-tolerant MPI that extends the standard MPI’s interface and semantics. FT-MPI provides application programmers with different methods of dealing with failures within an MPI application than just checkpoint and restart. FT-MPI allows the semantics and associated modes of failures to be explicitly controlled by an application via the modified MPI API. FT-MPI allows for atomic communications, and the level of correctness can be varied for individual communicators. This enables users to fine-tune for coherency or performance as system and application conditions may dictate.

Third, the standard MPI library does not provide features that facilitate the writing of parallel programs that distribute computations and communications unevenly, taking into account the processor speeds and the speeds and bandwidths of communication links. In this chapter we present a research effort in this direction—a small set of extensions to MPI, called HMPI (Heterogeneous MPI), aimed at efficient parallel computing on heterogeneous NoCs. Actually HMPI is an of adaptation of mpC language to the MPI programming level.

Категории