RTP: Audio and Video for the Internet
The rate at which each participant sends RTCP packets is not fixed but varies according to the size of the session and the format of the media stream. The aim is to restrict the total amount of RTCP traffic to a fixed fraction ”usually 5% ”of the session bandwidth. This goal is achieved by a reduction in the rate at which each participant sends RTCP packets as the size of the session increases . In a two-party telephone call using RTP, each participant will send an RTCP report every few seconds; in a session with thousands of participants ”for example, an Internet radio station ”the interval between RTCP reports from each listener may be many minutes. Each participant decides when to send RTCP packets on the basis of the set of rules described later in this section. It is important to follow these rules closely, especially for implementations that may be used in large sessions. If implemented correctly, RTCP will scale to sessions with many thousands of members. If not, the amount of control traffic will grow linearly with the number of members and will cause significant network congestion. Reporting Interval
Compound RTCP packets are sent periodically, according to a randomized timer. The average time each participant waits between sending RTCP packets is known as the reporting interval. It is calculated on the basis of several factors:
If the number of senders is greater than zero but less than one-quarter of the total number of participants, the reporting interval depends on whether we are sending. If we are sending, the reporting interval is set to the number of senders multiplied by the average size of RTCP packets, divided by 25% of the desired RTCP bandwidth. If we are not sending, the reporting interval is set to the number of receivers multiplied by the average size of RTCP packets, divided by 75% of the desired RTCP bandwidth: If ((senders > 0) and (senders < (25% of total number of participants)) { If (we are sending) { Interval = average RTCP size * senders / (25% of RTCP bandwidth) } else { Interval = average RTCP size * receivers / (75% of RTCP bandwidth) } } If there are no senders, or if more than one-quarter of the members are senders, the reporting interval is calculated as the average size of the RTCP packets multiplied by the total number of members, divided by the desired RTCP bandwidth: if ((senders = 0) or (senders > (25% of total number of participants)) { Interval = average RTCP size * total number of members / RTCP bandwidth } These rules ensure that senders have a significant fraction of the RTCP bandwidth, sharing at least one-quarter of the total RTCP bandwidth. The RTCP packets required for lip synchronization and identification of senders can therefore be sent comparatively quickly, while still allowing reports from receivers. The resulting interval is always compared to an absolute minimum value, which by default is chosen to be 5 seconds. If the interval is less than the minimum interval, it is set to the minimum: If (Interval < minimum interval) { Interval = minimum interval } In some cases it is desirable to send RTCP more often than the default minimum interval. For example, if the data rate is high and the application demands more timely reception quality statistics, a short default interval will be required. The latest revision of the RTP specification allows for a reduced minimum interval in these cases: Minimum interval = 360 / (session bandwidth in Kbps) This reduced minimum is smaller than 5 seconds for session bandwidths greater than 72 Kbps. When the reduced minimum is being used, it is important to remember that some participants may still be using the default value of 5 seconds, and to take this into account when determining whether to time out a participant because of inactivity. The resulting interval is the average time between RTCP packets. The transmission rules described next are then used to convert this value into the actual send time for each packet. The reporting interval should be recalculated whenever the number of participants in a session changes, or when the fraction of senders changes. Basic Transmission Rules
When an application starts, the first RTCP packet is scheduled for transmission on the basis of an initial estimate of the reporting interval. When the first packet is sent, the second packet is scheduled, and so on. The actual time between packets is randomized, between one-half and one and a half times the reporting interval, to avoid synchronization of the participants' reports, which could cause them to arrive all at once, every time. Finally, if this is the first RTCP packet sent, the interval is halved to provide faster feedback that a new member has joined, thereby allowing the next send time to be calculated as shown here: I = (Interval * random[0.5, 1.5]) if (this is the first RTCP packet we are sending) { I *= 0.5 } next_rtcp_send_time = current_time + I The routine random[0.5, 1.5] generates a random number in the interval 0.5 to 1.5. On some platforms it may be implemented by the rand() system call; on others, a call such as drand48() may be a better source of randomness. As an example of the basic transmission rules, consider an Internet radio station sending 128-Kbps MP3 audio using RTP-over-IP multicast, with an audience of 1,000 members. The default values for the minimum reporting interval (5 seconds) and RTCP bandwidth fraction (5%) are used, and the average size of RTCP packets is assumed to be 90 octets (including UDP/IP headers). When a new audience member starts up, it will not be aware of the other listeners, because it has not yet received any RTCP data. It must assume that the only other member is the sender and calculate its initial reporting interval accordingly . The fraction of members who are senders (the single source) is more than 25% of the known membership (the source and this one receiver), so the reporting interval is calculated like this: Interval = average RTCP size * total number of members / RTCP bandwidth = 90 octets * 2 / (5% of 128 Kbps) = 180 octets / 800 octets per second = 0.225 seconds Because 0.225 seconds is less than the minimum, the minimum interval of 5 seconds is used as the interval. This value is then randomized and halved because this is the first RTCP packet to be sent. Thus the first RTCP packet is sent between 1.25 and 3.75 seconds after the application is started. During the time between starting the application and sending the first RTCP packet, several receiver reports will have been received from the other members of the session, allowing the implementation to update its estimate of the number of members. This updated estimate is used to schedule the second RTCP packet. As we will see later, 1,000 listeners is enough that the average interval will be greater than the minimum, so the rate at which RTCP packets are received in aggregate from all listeners is 75% x 800 bytes per second · 90 bytes per packet = 6.66 packets per second. If the application sends its first RTCP packet after, say, 2.86 seconds, the known audience size will be approximately 2.86 seconds x 6.66 per second = 19. Because the fraction of senders is now less than 25% of the known membership, the reporting interval for the second packet is then calculated in this way: Interval = receivers * average RTCP size / (75% of RTCP bandwidth) = 19 * 90 / (75% of (5% of 128 Kbps)) = 1710 / (0.75 * (0.05 * 16000 octets/second)) = 1710 / 600 = 2.85 seconds Again, this value is increased to the minimum interval and randomized. The second RTCP packet is sent between 2.5 and 7.5 seconds after the first. The process repeats, with an average of 33 new receivers being heard from between sending the first and second RTCP packets, for a total known membership of 52. The result will be an average interval of 7.8 seconds, which, because it is greater than the minimum, is used directly. Consequently the third packet is sent between 3.9 and 11.7 seconds after the second. The average interval between packets increases as the other receivers become known, until the complete audience has been heard from. The interval is then calculated in this way: Interval = receivers * average RTCP size / (75% of RTCP bandwidth) = 1000 * 90 / (75% of (5% of 128 Kbps)) = 90000 / (0.75 * (0.05 * 16000 octets/second)) = 90000 / 600 = 150 seconds An interval of 150 seconds is equivalent to 1/150 = 0.0066 packets per second, which with 1,000 listeners gives the average RTCP reception rate of 6.66 packets per second. The proposed standard version of RTP 6 uses only these basic transmission rules. Although these are sufficient for many applications, they have some limitations that cause problems in sessions with rapid changes in membership. The concept of reconsideration was introduced to avoid these problems. Forward Reconsideration
As the preceding section suggested, when the session is large, it takes a certain number of reporting intervals before a new member knows the total size of the session. During this learning period, the new member is sending packets faster than the "correct" rate, because of incomplete knowledge. This issue becomes acute when many members join at once, a situation known as a step join . A typical scenario in which a step join may occur is at the start of an event, when an application starts automatically for many participants at once. In the case of a step join, if only the basic transmission rules are used, each participant will join and schedule its first RTCP packet on the basis of an initial estimate of zero participants. It will send that packet after an average of half of the minimum interval, and it will schedule the next RTCP packet on the basis of the observed number of participants at that time, which can now be several hundreds or even thousands. Because of the low initial estimate for the size of the group , there is a burst of RTCP traffic when all participants join the session, and this can congest the network. Rosenberg has studied this phenomenon 100 and reports on the case in which 10,000 members join a session at once. His simulations show that in such a step join, all 10,000 members try to send an RTCP packet within the first 2.5 seconds, which is almost 3,000 times the desired rate. Such a burst of packets will cause extreme network congestion ”not the desired outcome for a low-rate control protocol. Continually updating the estimate of the number of participants and the fraction who are senders, and then using these numbers to reconsider the send time of each RTCP packet, can solve this problem. When the scheduled transmission time arrives, the interval is recalculated on the basis of the updated estimate of the group size, and this value is used to calculate a new send time. If the new send time is in the future, the packet is not sent but is rescheduled for that time. This procedure may sound complex, but it is actually simple to implement. Consider the pseudocode for the basic transmission rules, which can be written like this: if (current_time >= next_rtcp_send_time) { send RTCP packet next_rtcp_send_time = rtcp_interval() + current_time } With forward reconsideration, this changes to the following: if (current_time >= next_rtcp_check_time) { new_rtcp_send_time = (rtcp_interval() / 1.21828) + last_rtcp_send_time if (current_time >= new_rtcp_send_time) { send RTCP packet next_rtcp_check_time = (rtcp_interval() /1.21828) + current_time } else { next_rtcp_check_time = new_send_time } } Here the function rtcp_interval() returns a randomized sampling of the reporting interval, based on the current estimate of the session size. Note the division of rtcp_interval() by a factor of 1.21828 (Euler's constant e minus 1.5). This is a compensating factor for the effects of the reconsideration algorithm, which converges to a value below the desired 5% bandwidth fraction. The effect of reconsideration is to delay RTCP packets when the estimate of the group size is increasing. This effect is shown in Figure 5.13, which illustrates that the initial burst of packets is greatly reduced when reconsideration is used, comprising only 75 packets ”rather than 10,000 ”before the other participants learn to scale back their reporting interval. Figure 5.13. The Effect of Forward Reconsideration on RTCP Send Rates (Adapted from J. Rosenberg and H. Schulzrinne, "Timer Reconsideration for Enhanced RTP Scalability," Proceedings of IEEE Infocom '98, San Francisco, CA, March 1998. 1998 IEEE.)
As another example, consider the scenario discussed in the previous section, Basic Transmission Rules, in which a new listener is joining an established Internet radio station using multicast RTP. When the listener is joining the session, the first RTCP packet is scheduled as before, between 1.25 and 3.75 seconds after the application is started. The difference comes when the scheduled transmission time arrives: Rather than sending the packet, the application reconsiders the schedule on the basis of the current estimate of the number of members. As was calculated before, assuming a random initial interval of 2.86 seconds, the application will have received about 19 RTCP packets from the other members, and a new average interval of 2.85 seconds will be calculated: Interval = number of receivers * average RTCP size / (75% of RTCP bandwidth) = 19 * 90 / (0.75 * (0.05 * 16000 octets/second)) = 1710/ 600 = 2.85 seconds The result is less than the minimum, so the minimum of 5 seconds is used, randomized and divided by the scaling factor. If the resulting value is less than the current time (in this example 2.85 seconds after the application started), then the packet is sent. If not ”for example, if the new randomized value is 5.97 seconds ” the packet is rescheduled for the later time. After the new timer expires (in this example 5.97 seconds after the application started), the reconsideration process takes place again. At this time the receiver will have received RTCP packets from approximately 5.97 seconds x 6.66 per second = 40 other members, and the recalculated RTCP interval will be 6 seconds before randomization and scaling. The process repeats until the reconsidered send time comes out before the current time. At that point the first RTCP packet is sent, and the second is scheduled. Reconsideration is simple to implement, and it is recommended that all implementations include it, even though it has significant effects only after the number of participants reaches several hundred. An implementation that includes forward reconsideration will be safe no matter what size the session, or how many participants join simultaneously . One that uses only the basic transmission rules may send RTCP too often, causing network congestion in large sessions with synchronized joins. Reverse Reconsideration
If there are problems with step joins, one might reasonably expect there to be problems due to the rapid departure of many participants (a step leave ). This is indeed the case with the basic transmission rules, although the problem is not with RTCP being sent too often and causing congestion, but with it not being sent often enough, causing premature timeout of participants. The problem occurs when most, but not all, of the members leave a large session. As a result the reporting interval decreases rapidly , perhaps from several minutes to several seconds. With the basic transmission rules, however, packets are not rescheduled after the change, although the timeout interval is updated. The result is that those members who did not leave are marked as having timed out; their packets do not arrive within the new timeout period. The problem is solved in a similar way to that of step joins: When each BYE packet is received, the estimate of the number of participants is updated, and the send time of the next RTCP packet is reconsidered. The difference from forward reconsideration is that the estimate will be getting smaller, so the next packet is sent earlier than it would otherwise have been. When a BYE packet is received, the new transmission time is calculated on the basis of the fraction of members still present after the BYE, and the amount of time left before the original scheduled transmission time. The procedure is as follows : if (BYE packet received) { member_fraction = num_members_after_BYE / num_members_before_BYE time_remaining = next_rtcp_send_time current_time next_rtcp_send_time = current_time + member_fraction * time_remaining } The result is a new transmission time that is earlier than the original value, but later than the current time. Packets are therefore scheduled early enough that the remaining members do not time each other out, preventing the estimate of the number of participants from erroneously falling to zero.
Implementation of reverse reconsideration is a secondary concern: It's an issue only in sessions with several hundred participants and rapid changes in membership, and failing to implement it may result in false timeouts but no networkwide problems. BYE Reconsideration
In the proposed standard version of RTP, 6 a member desiring to leave a session sends a BYE packet immediately, then exits. If many members decide to leave at once, this can cause a flood of BYE packets and can result in network congestion (much as happens with RTCP packets during a step join, if forward reconsideration is not employed). To avoid this problem, the current version of RTP allows BYE packets to be sent immediately only if there are fewer than 50 members when a participant decides to leave. If there are more than 50 members, the leaving member should delay sending a BYE if other BYE packets are received while it is preparing to leave, a process called BYE reconsideration . BYE reconsideration is analogous to forward reconsideration, but based on a count of the number of BYE packets received, rather than the number of other members. When a participant wants to leave a session, it suspends normal processing of RTP/RTCP packets and schedules a BYE packet according to the forward reconsideration rules, calculated as if there were no other members and as if this were the first RTCP packet to be sent. While waiting for the scheduled transmission time, the participant ignores all RTP and RTCP packets except for BYE packets. The BYE packets received are counted, and when the scheduled BYE transmission time arrives, it is reconsidered on the basis of this count. The process continues until the BYE is sent, and then the participant leaves the session. As this description suggests, the delay before a BYE can be sent depends on the number of members leaving. If only a single member decides to leave, the BYE will be delayed between 1.026 and 3.078 seconds (based on a 5-second minimum reporting interval, halved because BYE packets are treated as if they're the initial RTCP packet). If many participants decide to leave at once, there may be a considerable delay between deciding to leave a session and being able to send the BYE packet. If a fast exit is needed, it is safe to leave the session without sending a BYE; other participants will time out their state eventually. The use of BYE reconsideration is a relatively minor decision: It is useful only when many participants leave a session at once, and when the others care about receiving notification that a participant has left. It is safe to leave large sessions without sending a BYE, rather than implementing the BYE reconsideration algorithm. Comments on Reconsideration
The reconsideration rules were introduced to allow RTCP to scale to very large sessions in which the membership changes rapidly. I recommend that all implementations include reconsideration, even if they are initially intended only for use in small sessions; this will prevent future problems if the tool is used in a way the designer did not foresee. On first reading, the reconsideration rules appear complex and difficult to implement. In practice, they add a small amount of additional code. My implementation of RTP and RTCP consists of about 2,500 lines of C code (excluding sockets and encryption code). Forward and reverse reconsideration together add only 15 lines of code. BYE reconsideration is more complex, at 33 lines of code, but still not a major source of difficulty. Correct operation of the reconsideration rules depends to a large extent on the statistical average of the behavior of many individual participants. A single incorrect implementation in a large session will cause little noticeable difference to the behavior, but many incorrect implementations in a single session can lead to significant congestion problems. For small sessions, this is largely a theoretical problem, but as the session size increases, the effects of bad RTCP implementations are magnified and can cause network congestion that will affect the quality of the audio and/or video. Common Implementation Problems
The most common problems observed with RTCP implementations relate to the basic transmission rules, and to the bandwidth calculation:
When testing the behavior of an RTCP implementation, it is important to use a range of scenarios. Problems can be found in tests of both large and small sessions, sessions in which the membership changes rapidly, sessions in which a large fraction of the participants are senders and in which few are senders, and sessions in which step joins and leaves occur. Testing large-scale sessions is inherently difficult. If an implementation can be structured to be independent of the underlying network transport system, it will allow the simulation of large sessions on a single test machine. The IETF audio/video transport working group has produced a document describing testing strategies for RTP implementations, 40 which may also be useful. |