Mac OS X Internals: A Systems Approach
9.2. Mach IPC: An Overview
Mach provides a message-oriented, capability-based IPC facility that represents an evolution of similar approaches used by Mach's precursors, namely, Accent and RIG. Mach's IPC implementation uses the VM subsystem to efficiently transfer large amounts of data using copy-on-write optimizations. The Mac OS X kernel uses the general message primitive provided by Mach's IPC interface as a low-level building block. In particular, the Mach calls mach_msg() and mach_msg_overwrite() can be used for both sending and receiving messages (in that order), allowing RPC[2]-style interaction as a special case of IPC. This type of RPC is used for implementing several system services in Mac OS X. [2] Remote procedure call.
[3] "A System for Interprocess Communication in a Resource Sharing Computer Network," by David C. Walden (Communications of the ACM 15:4, April 1972, pp. 221230). The Mach IPC facility is built on two basic kernel abstractions: ports and messages, with messages passing between ports as the fundamental communication mechanism. A port is a multifaceted entity, whereas a message is an arbitrarily sized collection of data objects. 9.2.1. Mach Ports
Mach ports serve the following primary purposes in the operating system.
A port's name can stand for several entities, such as a right for sending or receiving messages, a dead name, a port set, or nothing. In general, we refer to what a port name stands for as a port right, although the term right may seem unintuitive in some situations. We will discuss details of these concepts later in this chapter. 9.2.1.1. Ports for Communication
In its role as a communications channel, a Mach port resembles a BSD socket, but there are important differences, such as those listed here.
When we talk of a message being sent to a task, we mean that the message is sent to a port that the recipient task has receive rights to. The message is dequeued by a thread within the recipient task.
Integration of IPC with virtual memory allows messages to be mappedcopy-on-write, if possible and appropriateinto the receiving task's address space. In theory, a message could be as large as the size of a task's address space. Although the Mach kernel itself does not include any explicit support for distributed IPC, communication can be transparently extended over the network by using external (user-level) tasks called Network Servers, which simply act as local proxies for remote tasks. A message sent to a remote port will be sent to a local Network Server, which is responsible for forwarding it to a Network Server on the remote destination machine. The participant tasks are unaware of these details, hence the transparency.
Although the xnu kernel retains most of the semantics of Mach IPC, network-transparent Mach IPC is not used on Mac OS X. 9.2.1.2. Port Rights
The following specific port right types are defined on Mac OS X.
A port is considered to be destroyed when its receive right is deallocated. Although existing send or send-once rights will transform into dead names when this happens, existing messages in the ports queue are destroyed, and any associated out-of-line memory is freed.
The following are some noteworthy aspects of port rights.
9.2.1.3. Ports as Objects
The Mach IPC facility is a general-purpose object-reference mechanism that uses ports as protected access points. In semantic terms, the Mach kernel is a server that serves objects on various ports. This kernel server receives incoming messages, processes them by performing the requested operations, and, if required, sends a reply. This approach allows a more general and useful implementation of several operations that have been historically implemented as intraprocess function calls. For example, one Mach task can allocate a region of virtual memory in another task's address spaceif permittedby sending an appropriate message to the port representing the target task. Note that the same model is used for accessing both user-level and kernel services. In either case, a task accesses the service by having one of its threads send messages to the service provider, which can be another user task or the kernel.
Besides message passing, little Mach functionality is exposed through Mach traps. Most Mach services are provided through message-passing interfaces. User programs typically access these services by sending messages to the appropriate ports.
We saw earlier that ports are used to represent both tasks and threads. When a task creates another task or a thread, it automatically gets access to the newly created entity's port. Since port ownership is task-level, all per-thread ports in a task are accessible to all threads within that task. A thread can send messages to other threads within its tasksay, to suspend or resume their execution. It follows that having access to a task's port implicitly provides access to all threads within that task. The converse does not hold, however: Having access to a thread's port does not give access to its containing task's port. 9.2.1.4. Mach Port Allocation
A user program can acquire a port right in several ways, examples of which we will see later in this chapter. A program creates a new port right through the mach_port_allocate family of routines, of which mach_port_allocate() is the simplest: int mach_port_allocate(ipc_space_t task, // task acquiring the port right mach_port_right_t right, // type of right to be created mach_port_name_t *name); // returns name for the new right
We will discuss details of port allocation in Section 9.3.5. 9.2.2. Mach IPC Messages
Mach IPC messages can be sent and received through the mach_msg family of functions. The fundamental IPC system call in Mac OS X is a trap called mach_msg_overwrite_trap() [osfmk/ipc/mach_msg.c], which can be used for sending a message, receiving a message, or both sending and receiving (in that orderan RPC) in a single call. // osfmk/ipc/mach_msg.c mach_msg_return_t mach_msg_overwrite_trap( mach_msg_header_t *snd_msg, // message buffer to be sent mach_msg_option_t option, // bitwise OR of commands and modifiers mach_msg_size_t send_size, // size of outgoing message buffer mach_msg_size_t rcv_size, // maximum size of receive buffer (rcv_msg) mach_port_name_t rcv_name, // port or port set to receive on mach_msg_timeout_t timeout, // timeout in milliseconds mach_port_name_t notify, // receive right for a notify port mach_msg_header_t *rcv_msg, // message buffer for receiving mach_msg_size_t scatterlist_sz); // size of scatter list control info The behavior of mach_msg_overwrite_trap() is controlled by setting the appropriate bits in the option argument. These bits determine what the call does and how it does it. Some bits cause the call to use one or more of the other arguments, which may be unused otherwise. The following are some examples of individual bits that can be set in option.
The header file osfmk/mach/message.h contains the full set of modifiers that can be used with the mach_msg family of functions.
Another Mach trap, mach_msg_trap(), simply calls mach_msg_overwrite_trap() with zeros as the last two argumentsit uses the same buffer when the call is used for both sending and receiving, so the rcv_msg argument is not needed. The scatterlist_sz argument is used when the receiver, while receiving an out-of-line message (see Section 9.5.5), wants the kernel not to dynamically allocate memory in the receiver's address space but to overwrite one or more preexisting valid regions with the received data. In this case, the caller describes which regions to use through out-of-line descriptors in the ingoing rcv_msg argument, and scatterlist_sz specifies the size of this control information. The system library provides user-level wrappers around the messaging traps (Figure 91). The wrappers handle possible restarting of the appropriate parts of IPC operations in the case of interruptions. Figure 91. System library wrappers around Mach messaging traps
User programs normally use mach_msg() or mach_msg_overwrite() to perform IPC operations. Variants such as mach_msg_receive() and mach_msg_send() are other wrappers around mach_msg(). The anatomy of a Mach message has evolved over time, but the basic layout consisting of a fixed-size header[4] and other variable-size data has remained unchanged. Mach messages in Mac OS X contain the following parts: [4] Note that unlike an Internet Protocol packet header, the send- and receive-side headers are not identical for a Mach IPC message.
A message can be either simple or complex. A simple message contains a header immediately followed by untyped data, whereas a complex message contains a structured message body. Figure 92 shows how the parts of a complex Mach message are laid out. The body consists of a descriptor count followed by that many descriptors, which are used to transfer out-of-line memory and port rights. Removing the body from this picture gives the layout of a simple message. Figure 92. The layout of a complex Mach message
9.2.2.1. Message Header
The meanings of the message header fields are as follows.
The msgh_remote_port and msgh_local_port values are swapped (reversed with respect to the sender's view) in the message header seen by the recipient. Similarly, the bits in msgh_bits are also reversed.
9.2.2.2. Message Body
A nonempty message body may contain data that is passive (uninterpreted by the kernel), active (processed by the kernel), or both. Passive data resides inline in the message body and is meaningful only to the sender and the recipient. Examples of active data include port rights and out-of-line memory regions. Note that a message that carries anything but inline passive data is a complex message. As noted earlier, a complex message body contains a descriptor count followed by that many descriptors. Figure 93 shows some descriptor types that are available for carrying different types of content. Figure 93. Descriptors for sending ports and out-of-line memory in Mach IPC messages
A mach_msg_port_descriptor_t is used for passing a port right. Its name field specifies the name of the port right being carried in the message, whereas the disposition field specifies the IPC processing to be performed for the right, based on which the kernel passes the appropriate right to the recipient. The following are examples of disposition types.
A mach_msg_ool_descriptor_t is used for passing out-of-line memory. Its address field specifies the starting address of the memory in the sender's address space, whereas the size field specifies the memory's size in bytes. If the deallocate Boolean value is true, the set of pages containing the data will be deallocated in the sender's address space after the message is sent. The copy field is used by the sender to specify how the data is to be copiedeither virtually (MACH_MSG_VIRTUAL_COPY) or physically (MACH_MSG_PHYSICAL_COPY). The recipient uses the copy field to specify whether to dynamically allocate space for the received out-of-line memory regions (MACH_RCV_ALLOCATE) or to write over existing specified regions of the receiver's address space (MACH_MSG_OVERWRITE). As far as possible, and unless explicitly overridden, memory transferred in this manner is shared copy-on-write between senders and recipients.
Once a send call returns, the sender can modify the message buffer used in the send call without affecting the message contents. Similarly, the sender can also modify any out-of-line memory regions transferred.
A mach_msg_ool_ports_descriptor_t is used to pass an out-of-line array of ports. Note that such an array is always physically copied while being sent. 9.2.2.3. Message Trailer
A received Mach message contains a trailer after the message data. The trailer is aligned on a natural boundary. The msgh_size field in the received message header does not include the size of the received trailer. The trailer itself contains the trailer size in its msgh_trailer_size field. The kernel may provide several trailer formats, and within each format, there can be multiple trailer attributes. Mac OS X 10.4 provides only one trailer format: MACH_MSG_TRAILER_FORMAT_0. This format provides the following attributes (in this order): a sequence number, a security token, and an audit token. During messaging, the receiver can request the kernel to append one or more of these attributes as part of the received trailer on a per-message basis. However, there is a caveat: To include a later attribute in the trailer, the receiver must accept all previous attributes, where the later/previous qualifiers are with respect to the aforementioned order. For example, including the audit token in the trailer will automatically include the security token and the sequence number. The following types are defined to represent valid combinations of trailer attributes:
A security token is a structure containing the effective user and group IDs of the sending task (technically, of the associated BSD process). These are populated by the kernel securely and cannot be spoofed by the sender. An audit token is an opaque object that identifies the sender of a Mach message as a subject to the kernel's BSM auditing subsystem. It is also filled in securely by the kernel. Its contents can be interpreted using routines in the BSM library.
A task inherits its security and audit tokens from the task that creates it. A task without a parent (i.e., the kernel task) has its security and audit tokens set to KERNEL_SECURITY_TOKEN and KERNEL_AUDIT_TOKEN, respectively. These are declared in osfmk/ipc/mach_msg.c. As the kernel evolves, it is likely that other types of tokens that include more comprehensive information could be supported.
Figure 94 shows an example of how to request the kernel to include the security token in the trailer of a received message. Figure 94. Requesting the kernel to include the sender's security token in the message trailer
The MACH_RCV_TRAILER_ELEMENTS() macro is used to encode the number of trailer elements desiredvalid numbers are defined in osfmk/mach/message.h: #define MACH_RCV_TRAILER_NULL 0 // mach_msg_trailer_t #define MACH_RCV_TRAILER_SEQNO 1 // mach_msg_trailer_seqno_t #define MACH_RCV_TRAILER_SENDER 2 // mach_msg_security_trailer_t #define MACH_RCV_TRAILER_AUDIT 3 // mach_msg_audit_trailer_t
Note that the receive buffer must contain sufficient space to hold the requested trailer type.
In a client-server system, both the client and the server can request the other party's security token to be appended to the incoming message trailer.
|
Категории