RTP: Audio and Video for the Internet

The sender enables synchronization of media streams at the receiver by running a common reference clock and periodically announcing, through RTCP, the relationship between the reference clock time and the media stream time, as well as the identities of the streams to be synchronized. The reference clock runs at a fixed rate; correspondence points between the reference clock and the media stream allow the receiver to work out the relative timing relationship between the media streams. This process is shown in Figure 7.2.

Figure 7.2. Mapping Media Time Lines to a Common Clock at the Sender

The correspondence between reference clock and media clock is noted when each RTCP packet is generated: A sampling of the reference clock, T reference , is included in the packet along with a calculated RTP timestamp, T RTP = T reference x R audio + O audio . The multiplication must be made modulo 2 32 , to restrict the result to the range of the 32-bit RTP timestamp. The offset is calculated as O audio = T audio “ ( T available “ D audio_capture ) x R audio , being the conversion factor between media and reference timelines . Operating system latencies can delay T available and cause variation in the offset, which should be filtered by the application to choose a minimum value. (The obvious changes to the formulae are made in the case of video.)

Each application on the sender that is transmitting RTP streams needs access to the common reference clock, T reference , and must identify its media with reference to a canonical source identifier. The sending applications should be aware of the media capture delay ”for example, D audio_capture ”because it can be significant and should be taken into account in the calculation and announcement of the relationship between reference clock times and media clock times.

The common reference clock is the "wall clock" time used by RTCP. It takes the form of an NTP-format timestamp, counting seconds and fractions of a second since midnight UTC (Coordinated Universal Time) on January 1, 1900. 5 (Senders that have no knowledge of the wall clock time may use a system-specific clock such as "system uptime" to calculate NTP-format timestamps as an alternative; the choice of a reference clock does not affect synchronization, as long as it is done consistently for all media.) Senders periodically establish a correspondence between the media clock for each stream and the common reference clock; this is communicated to receivers via RTCP sender report packets as described in the section titled RTCP SR: Sender Reports in Chapter 5, RTP Control Protocol.

In typical scenarios, there is no requirement for the sender or receiver to be synchronized to an external clock. In particular, although the wall clock time in RTCP sender report packets uses the format of an NTP timestamp, it is not required to be synchronized to an NTP time source. Sender and receiver clocks never have to be synchronized to each other. Receivers do not care about the absolute value of the NTP format timestamp in RTCP sender report packets, only that the clock is common between media, and of sufficient accuracy and stability to allow synchronization.

Synchronized clocks are required only when media streams generated by different hosts are being synchronized. An example would be multiple cameras giving different viewpoints on a scene, connected to separate hosts with independent network connections. In this instance the sending hosts need to use a time protocol or some other means to align their reference clocks to a common time base. RTP does not mandate any particular method of defining that time base, but the Network Time Protocol 5 may be appropriate, depending on the degree of synchronization required. Figure 7.3 shows the requirements for clock synchronization when media streams from different hosts are to be synchronized at playout.

Figure 7.3. Synchronization of Media Generated by Different Hosts

The other requirement for synchronization is to identify sources that are to be synchronized. RTP does this by giving the related sources a shared name , so a receiver knows which streams it should attempt to synchronize and which are independent. Each RTP packet contains a synchronization source (SSRC) identifier to associate the source with a media time base. The SSRC identifier is chosen randomly and will not be the same for all the media streams to be synchronized (it may also change during a session if identifiers collide, as explained in Chapter 4, RTP Data Transfer Protocol). A mapping from SSRC identifiers to a persistent canonical name (CNAME) is provided by RTCP source description (SDES) packets. A sender should ensure that RTP sessions to be synchronized on playout have a common CNAME so that receivers know to align the media.

The canonical name is chosen algorithmically according to the user name and network address of the source host (see the section RTCP SDES: Source Description in Chapter 5, RTP Control Protocol). If multiple media streams are being generated by a single host, the task of ensuring that they have a common CNAME, and hence can be synchronized, is simple. If the goal is to synchronize media streams generated by several hosts ”for example, if one host is capturing and transmitting audio while another transmits video ”the choice of CNAME is less obvious because the default method in the RTP standard would require each host to use its own IP address as part of the CNAME. The solution is for the hosts to conspire in choosing a common CNAME for all streams that are to be synchronized, even if this means that some hosts use a CNAME that doesn't match their network address. The mechanism by which this conspiracy happens is not specified by RTP: One solution might be to use the lowest -numbered IP address of the hosts when constructing the CNAME; another might be for the audio to use the CNAME of the video host (or vice versa). This coordination would typically be provided by a session control protocol ”for example, SIP or H.323 ”outside the scope of RTP. A session control protocol could also indicate which streams should be synchronized by a method that does not rely on the CNAME.

Категории