RTP: Audio and Video for the Internet

Packet Reception

An RTP session comprises both data and control flows, running on distinct ports (usually the data packets flow on an even-numbered port, with control packets on the next higher ”odd-numbered ”port). This means that a receiving application will open two sockets for each session: one for data, one for control. Because RTP runs above UDP/IP, the sockets used are standard SOCK_DGRAM sockets, as provided by the Berkeley sockets API on UNIX-like systems, and by Winsock on Microsoft platforms.

Once the receiving sockets have been created, the application should prepare to receive packets from the network and store them for further processing. Many applications implement this as a loop, calling select() repeatedly to receive packets ”for example:

fd_data = create_socket(...); fd_ctrl = create_socket(...); while (not_done) { FD_ZERO(&rfd); FD_SET(fd_data, &rfd); FD_SET(fd_ctrl, &rfd); timeout = ...; if (select(max_fd, &rfd, NULL, NULL, timeout) > 0) { if (FD_ISSET(fd_data, &rfd)) { ...validate data packet ...process data packet } if (FD_ISSET(fd_ctrl, &rfd)) { ...validate control packet ...process control packet } } ...do other processing }

Data and control packets are validated for correctness as described in Chapters 4, RTP Data Transfer Protocol, and 5, RTP Control Protocol, and processed as described in the next two sections. The timeout of the select() operation is typically chosen according to the framing interval of the media. For example, a system receiving audio with 20-millisecond packet duration will implement a 20-millisecond timeout, allowing the other processing ”such as decoding the received packets ”to occur synchronously with arrival and playout, and resulting in an application that loops every 20 milliseconds .

Other implementations may be event driven rather than having an explicit loop, but the basic concept remains: Packets are continually validated and processed as they arrive from the network, and other application processing must be done in parallel to this (either explicitly time-sliced, as shown above, or as a separate thread), with the timing of the application driven by the media processing requirements. Real-time operation is essential to RTP receivers; packets must be processed at the rate they arrive , or reception quality will be impaired.

Receiving Data Packets

The first stage of the media playout process is to capture RTP data packets from the network, and to buffer those packets for further processing. Because the network is prone to disrupt the interpacket timing, as shown in Figure 6.5, there will be bursts when several packets arrive at once and/or gaps when no packets arrive, and packets may even arrive out of order. The receiver does not know when data packets are going to arrive, so it should be prepared to accept packets in bursts, and in any order.

Figure 6.5. Disruption of Interpacket Timing during Network Transit

As packets are received, they are validated for correctness, their arrival time is noted, and they are added to a per-sender input queue, sorted by RTP timestamp, for later processing. These steps decouple the arrival rate of packets from the rate at which they are processed and played to the user , allowing the application to cope with variation in the arrival rate. Figure 6.6 shows the separation between the packet reception and playout routines, which are linked only by the input queues.

Figure 6.6. Separation of Packet Reception from Playout, Using Input Queues

It is important to store the exact arrival time, M , of RTP data packets so that the interarrival jitter can be calculated. Inaccurate arrival time measurements give the appearance of network jitter and cause the playout delay to increase. The arrival time should be measured according to a local reference wall clock, T , converted to the media clock rate, R . It is unlikely that the receiver has such a clock, so usually we calculate the arrival time by sampling the reference clock (typically the system wall clock time) and converting it to the local timeline:

where the offset is used to map from the reference clock to the media timeline, in the process correcting for skew between the media clock and the reference clock.

As noted earlier, processing of data packets may be time-sliced along with packet reception in a single-threaded application, or it may run in a separate thread in a multithreaded system. In a time-sliced design, a single thread handles both packet reception and playout. On each loop, all outstanding packets are read from the socket and inserted into the correct input queue. Packets are removed from the queues as needed and scheduled for playout. If packets arrive in bursts, some may remain in their input queue for multiple iterations of the loop, depending on the desired rate of playout and available processing capacity.

A multithreaded receiver typically has one thread waiting for data to arrive on the socket, sorting arriving packets onto the correct input queue. Other threads pull data from the input queues and arrange for the decoding and playout of the media. The asynchronous operation of the threads, along with the buffering in the input queues, effectively decouples the playout process from short- term variations in the input rate.

No matter what design is chosen, an application will usually not be able to receive and process packets continually. The input queues accommodate fluctuation in the playout process within the application, but what of delays in the packet reception routine? Fortunately, most general-purpose operating systems handle reception of UDP/IP packets on an interrupt-driven basis and can buffer packets at the socket level even when the application is busy. This capability provides limited buffering before packets reach the application. The default socket buffer is suitable for most implementations, but applications that receive high-rate streams or have significant periods of time when they are unable to handle reception may need to increase the size of the socket buffer beyond its default value (the setsockopt(fd, SOL_SOCKET, SO_RCVBUF, ...) function performs this operation on many systems). The larger socket buffer accommodates varying delays in packet reception processing, but the time packets spend in the socket buffer appears to the application as jitter in the network. The application might increase its playout delay to compensate for this perceived variation.

Receiving Control Packets

In parallel with the arrival of data packets, an application must be prepared to receive, validate, process, and send RTCP control packets. The information in the RTCP packets is used to maintain the database of the senders and receivers within a session, as discussed in Chapter 5, RTP Control Protocol, and for participant validation and identification, adaptation to network conditions, and lip synchronization. The participant database is also a good place from which to hang the participant-specific input queues, playout buffer, and other state needed by the receiver.

Single-threaded applications typically include both data and control sockets in their select() loop, interleaving reception of control packets along with all other processing. Multithreaded applications can devote a thread to RTCP reception and processing. Because RTCP packets are infrequent compared to data packets, the overhead of their processing is usually low and is not especially time-critical. It is, however, important to record the exact arrival time of sender report (SR) packets because this value is returned in receiver report (RR) packets and used in the round-trip time calculation.

When RTCP sender/receiver report packets arrive ”describing the reception quality as seen at a particular receiver ”the information they contain is stored. Parsing the report blocks in SR/RR packets is straightforward, provided you remember that the data is in network byte order and must be converted to host order before being used. The count field in the RTCP header indicates how many report blocks are present; remember that zero is a valid value, indicating that the sender of the RTCP packet is not receiving any RTP data packets.

The main use of RTCP sender/receiver reports is for an application to monitor reception of the streams it has sent: If the reports indicate poor reception, it is possible either to add error protection codes or to reduce the sending rate to compensate. In multisender sessions it is also possible to monitor the quality of other senders, as seen by other receivers; for example, a network operations center might monitor SR/RR packets as a check that the network is operating correctly. Applications typically store reception quality data as it is received, and periodically they use the stored data to adapt their transmission.

Sender reports also contain the mapping between the RTP media clock and the sender's reference clock, used for lip synchronization (see Chapter 7), and a count of the amount of data sent. Once again, this information is in network byte order and needs to be converted before use. This information needs to be stored if it is to be used for lip synchronization purposes.

When RTCP source description packets arrive, the information they contain is stored and may be displayed to the user. The RTP specification contains sample code to parse SDES packets (see Appendix A.5 of the specification 50 ). The SDES CNAME (canonical name ) provides the link between audio and video streams, indicating where lip synchronization should be performed. It is also used to group multiple streams coming from a single source ”for example, if a participant has multiple cameras sending video to a single RTP session ”and this may affect the way media is displayed to the user.

Once RTCP packets have been validated, the information they contain is added to the participant database. Because the validity checks for RTCP packets are strong, the presence of a participant in the database is a solid indication that the participant is valid. This is a useful check when RTP packets are being validated: If the SSRC in an RTP data packet was previously seen in an RTCP packet, it is highly likely to be a valid source.

When RTCP BYE packets are received, entries in the participant database are marked for later removal. As noted in Chapter 5, RTP Control Protocol, entries are not removed immediately but should be kept for some small time to allow any delayed packets to arrive. (My own implementation uses a fixed two-second timeout; the precise value is unimportant, provided that it is larger than the typical network timing jitter.) Receivers also perform periodic housekeeping to time out inactive participants . Performing this task with every packet is not necessary; once per RTCP report interval is sufficient.

Категории