RTP: Audio and Video for the Internet

2017-07-07 02:10:07

Multiplexing has been an area of some controversy, and considerable discussion, within the IETF. Although TCRTP is the recommended best current practice, there are other proposals that merit further discussion. These include Generic RTP Multiplexing (GeRM), which is one of the few alternatives to TCRTP that maintains RTP semantics, and several application-specific multiplexes.

GeRM

Generic RTP Multiplexing (GeRM) was proposed at the IETF meeting in Chicago in August 1998 but was never developed into a complete protocol specification. ⁴⁵ GeRM uses the ideas of RTP header compression, but instead of compressing the headers between packets, it applies compression to multiple payloads multiplexed within a single packet. All compression state is reinitialized in each new packet, and as a result, GeRM can function effectively end-to-end.

CONCEPTS AND PACKET FORMAT

Figure 12.4 shows the basic operation of GeRM. A single RTP packet is created, and multiple RTP packets ”known as subpackets ”are multiplexed inside it. Each GeRM packet has an outer RTP header that contains the header fields of the first subpacket, but the RTP payload type field is set to a value indicating that this is a GeRM packet.

Figure 12.4. A GeRM Packet Containing Three Subpackets

The first subpacket header will compress completely except for the payload type field and length because the full RTP header and the subpacket header differ only in the payload type. The second subpacket header will then be encoded on the basis of predictable differences between the original RTP header for that subpacket and the original RTP header for the first subpacket. The third subpacket header is then encoded off of the original RTP header for the second subpacket, and so forth. Each subpacket header comprises a single mandatory octet, followed by several extension octets, as shown in Figure 12.5.

Figure 12.5. GeRM Subpacket Header

The meanings of the bits in the mandatory octet are as detailed here:

B0 : Zero indicates that the first octet of the original RTP header remains unchanged from the original RTP header in the previous subpacket (or outer RTP header if there's no previous subpacket in this packet). That is, V, CC, and P are unchanged. One indicates that the first octet of the original RTP header immediately follows the GeRM header.

B1 : This bit contains the marker bit from the subpacket's RTP header.

B2 : Zero indicates that the payload type remains unchanged. One indicates that the payload type field follows the GeRM header and any first-octet header that may be present. Although PT is a seven-bit field, it is added as an eight-bit field. Bit 0 of this field is always zero.

B3 : Zero indicates that the sequence number remains unchanged. One indicates that the 16-bit sequence number field follows the GeRM header and any first-octet or PT header that may be present.

B4 : Zero indicates that the timestamp remains unchanged. One indicates that the 32-bit timestamp field follows the GeRM header and any first-octet, PT, or sequence number header that may be present.

B5 : Zero indicates that the most significant 24 bits of the SSRC remain unchanged. One indicates that the most significant 24 bits of the SSRC follow the GeRM header and any first-octet, PT, sequence number, or timestamp field that may be present.

B6 : Zero indicates that the least significant eight bits of the SSRC are one higher than the preceding SSRC. One indicates that the least significant eight bits of the SSRC follow the GeRM header and any first-octet, PT, sequence number, timestamp, or MSB SSRC header fields that may be present.

B7 : Zero indicates that the subpacket length in bytes (ignoring the subpacket header) is unchanged from the previous subpacket. One indicates that the subpacket length (ignoring the subpacket header) follows all the other GeRM headers as an eight-bit unsigned integer length field. An eight-bit length field is sufficient because there is little to be gained by multiplexing larger packets.

Any CSRC fields present in the original RTP header then follow the GeRM headers. Following this is the RTP payload.

APPLICATION SCENARIOS

The bandwidth saving due to GeRM depends on the similarity of the headers between the multiplexed packets. Consider two scenarios: arbitrary packets and packets produced by cooperating applications.

If arbitrary RTP packets are to be multiplexed, the multiplexing gain is small. If there is no correlation between the packets, all the optional fields will be present and the subpacket header will be 14 octets in length. Compared to nonmultiplexed RTP, there is still a gain here because a 14-octet subheader is smaller than the 40-octet RTP/UDP/IP header that would otherwise be present, but the bandwidth saving is relatively small compared to the saving from standard header compression.

If the packets to be multiplexed are produced by cooperating applications, the savings due to GeRM may be much greater. In the simplest case, all the packets to be multiplexed have the same payload type, length, and CSRC list; so three octets are removed in all but the first subpacket header. If the applications generating the packets cooperate, they can collude to ensure that the sequence numbers and timestamps in the subpackets match, saving an additional six octets. Even more saving can be achieved if the applications generate packets with consecutive synchronization source identifiers, allowing the SSRC to be removed also.

Of course, such collusion between implementations is stretching the bounds of what is legal RTP. In particular, an application that generates nonrandom SSRC identifiers can cause serious problems in a session with standard RTP senders. Such nonrandom SSRC use is acceptable in two scenarios:

When RTP and GeRM are used to convey media data between two gateways. In this case the originators and receivers of the data are blissfully unaware that RTP and GeRM have been used to transfer data. An example might be a system that generates voice-over-IP packets as part of a gateway between two PSTN exchanges.

When the multiplexing device remaps the SSRC before inclusion in GeRM, with the demultiplexing device regenerating the original SSRC. In this case, the SSRC identifier mapping must be signaled out of band , but that may be possible as part of the call setup procedure.

At best, GeRM can produce packets with a two-octet header per multiplexed packet, which is a significant saving compared to nonmultiplexed RTP. GeRM will always reduce the header overheads, compared to nonmultiplexed RTP.

THE FUTURE OF GERM

GeRM is not a standard protocol, and there are currently no plans to complete its specification. There are several reasons for this, primary among them being concern that the requirements for applications to collude in their production of RTP headers will limit the scope of the protocol and cause interoperability problems if GeRM is applied within a network. In addition, the bandwidth saving is relatively small unless such collusion occurs, which may make GeRM less attractive.

The concepts of GeRM are useful as an application-specific multiplex, between two gateways that source and sink multiple RTP streams using the same codec, and that are willing to collude in the generation of the RTP headers for those streams. The canonical example is IP-to-PSTN gateways, in which the IP network acts as a long-distance trunk circuit between two PSTN exchanges. GeRM allows such systems to maintain most RTP semantics, while providing a multiplex that is efficient and can be implemented solely at the application layer.

Application-Specific Multiplexing

In addition to the general-purpose multiplexing protocols such as TCRTP and GeRM, various application-specific multiplexes have been proposed. The vast majority of these multiplexes have been targeted toward IP-to-PSTN gateways, in which the IP network acts as a long-distance trunk circuit between two PSTN exchanges. These gateways have many simultaneous voice connections between them, which can be multiplexed to improve the efficiency, enabling the use of low bit-rate voice codecs, and to improve scalability.

Such gateways often use a very restricted subset of the RTP protocol features. All the flows to be multiplexed commonly use the same payload format and codec, and it is likely that they do not employ silence suppression. Furthermore, each flow represents a single conversation, so there is no need for the mixer functionality of RTP. The result is that the CC, CSRC, M, P, and PT fields of the RTP header are redundant, and the sequence number and timestamp have a constant relation, allowing one of them to be elided. After these fields are removed, the only things left are the sequence number/timestamp and synchronization source (SSRC) identifier. Given such a limited use of RTP, there is a clear case for using an application-specific multiplex in these scenarios.

A telephony-specific multiplex may be defined as an operation on the RTP packets, transforming several RTP streams into a single multiplex with reduced headers. At its simplest, such a multiplex may concatenate packets with only the sequence number and a (possibly reduced) synchronization source into UDP packets, with out-of-band signaling being used to define the mapping between these reduced headers and the full RTP headers. Depending on the application, the multiplex may operate on real RTP packets, or it may be a logical operation with PSTN packets being directly converted into multiplexed packets. There are no standard solutions for such application-specific multiplexing.

As an alternative, it may be possible to define an RTP payload format for TDM (Time Division Multiplexing) payloads, which would allow direct transport of PSTN voice without first mapping it to RTP. The result is a "circuit emulation" format, defined to transport the complete circuit without caring for its contents.

In this case the RTP header will relate to the circuit. The SSRC, sequence number, and timestamp relate to the circuit, not to any of the individual conversations being carried on that circuit; the payload type identifies, for example, "T1 emulation"; the mixer functionality (CC and CSRC list) is not used, nor are the marker bit and padding. Figure 12.6 shows how the process might work, with each T1 frame forming a single RTP packet.

Figure 12.6. Voice Circuit Emulation

Of course, direct emulation of a T1 line gains little because the RTP overhead is large. However, it is entirely reasonable to include several consecutive T1 frames in each RTP packet, or to emulate a higher-rate circuit, both of which reduce the RTP overhead significantly.

The IETF has a Pseudo-Wire Edge-to-Edge Emulation working group , which is developing standards for circuit emulation, including PSTN (Public Switched Telephone Network), SONET (Synchronous Optical Network), and ATM (Asynchronous Transfer Mode) circuits. These standards are not yet complete, but an RTP payload format for circuit emulation is one of the proposed solutions.

The circuit emulation approach to IP-to-PSTN gateway design is a closer fit with the RTP philosophy than are application-specific multiplexing solutions. Circuit emulation is highly recommended as a solution for this particular application.