RTP: Audio and Video for the Internet

2017-07-07 02:10:07

Each application in an RTP session will maintain a database of information about the participants and about the session itself. The session information, from which the RTCP timing is derived, can be stored as a set of variables :

The RTP bandwidth ”that is, the typical session bandwidth, configured when the application starts.

The RTCP bandwidth fraction ”that is, the percentage of the RTP bandwidth devoted to RTCP reports . This is usually 5%, but profiles may define a means of changing this (0% also may be used, meaning that RTCP is not sent).

The average size of all RTCP packets sent and received by this participant.

The number of members in the session, the number of members when this participant last sent an RTCP packet, and the fraction of those who have sent RTP data packets during the preceding reporting interval.

The time at which the implementation last sent an RTCP packet, and the next scheduled transmission time.

A flag indicating whether the implementation has sent any RTP data packets since sending the last two RTCP packets.

A flag indicating whether the implementation has sent any RTCP packets at all.

In addition, the implementation needs to maintain variables to include in RTCP SR packets:

The number of packets and octets of RTP data it has sent.

The last sequence number it used.

The correspondence between the RTP clock it is using and an NTP-format timestamp.

A session data structure containing these variables is also a good place to store the SSRC being used, the SDES information for the implementation, and the file descriptors for the RTP and RTCP sockets. Finally, the session data structure should contain a database for information held on each participant.

In terms of implementation, the session data can be stored simply: a single structure in a C-based implementation, a class in an object-oriented system. With the exception of the participant-specific data, each variable in the structure or class is a simple type: integer, text string, and so on. The format of the participant-specific data is described next.

To generate RTCP packets properly, each participant also needs to maintain state for the other members in the session. A good design makes the participant database an integral part of the operation of the system, holding not just RTCP- related information, but all state for each participant. The per-participant data structure may include the following:

SSRC identifier.

Source description information: the CNAME is required; other information may be included (note that these values are not null- terminated , and care must be taken in their handling).

Reception quality statistics (packet loss and jitter), to allow generation of RTCP RR packets.

Information received from sender reports, to allow lip synchronization (see Chapter 7).

The last time this participant was heard from so that inactive participants can be timed out.

A flag indicating whether this participant has sent data within the current RTCP reporting interval.

The media playout buffer, and any codec state needed (see Chapter 6, Media Capture, Playout, and Timing).

Any information needed for channel coding and error recovery ”for example, data awaiting reception of repair packets before it can be decoded (see Chapters 8, Error Concealment, and 9, Error Correction).

Within an RTP session, members are identified by their synchronization source identifier. Because there may be many participants and they may need to be accessed in any order, the appropriate data structure for the participant database is a hash table, indexed by SSRC identifier. In applications that deal with only a single media format, this is sufficient. However, lip synchronization also requires the capability to look up sources by their CNAME. As a result, the participant database should be indexed by a double hash table: once by SSRC and once by CNAME.

Some implementations use less-than -perfect random number generators when choosing their SSRC identifier. This means that a simple hashing function ”for example, using the lowest few bits of the SSRC as an index into a table ”can lead to unbalanced and inefficient operation. Even though SSRC values are supposed to be random, they should be used with an efficient hashing function. Some have suggested using the MD5 hash of the SSRC as the basis for the index, although that may be considered overkill.

Participants should be added to the database after a validated packet has been received from them. The validation step is important: An implementation does not want to create a state for a participant unless it is certain that the participant is valid. Here are some guidelines:

If an RTCP packet is received and validated, the participant should be entered into the database. The validity checks on RTCP are strong, and it is difficult for bogus packets to satisfy them.

An entry should not be made on the basis of RTP packets only, unless multiple packets are received with consecutive sequence numbers . The validity checks possible for a single RTP packet are weak, and it is possible for a bogus packet to satisfy the tests yet be invalid.

This implies that the implementation should maintain an additional, lightweight table of probationary sources (sources in which only a single RTP packet has been received). To prevent bogus sources of RTP and RTCP data from using too much memory, this table should be aggressively timed out and should have a fixed maximum size. It is difficult to protect against an attacker who purposely generates many different sources to use up all memory of the receivers, but these precautions will prevent accidental exhaustion of memory if a misdirected non-RTP stream is received.

Each CSRC (contributing source) in a valid RTP packet also counts as a participant and should be added to the database. You should expect to receive SDES information for participants identified only by CSRC.

When a participant is added to the database, an application should also update the session-level count of the members and the sender fraction. Addition of a participant may also cause RTCP forward reconsideration, which will be discussed shortly.

Participants are removed from the database after a BYE packet is received or after a specified period of inactivity. This sounds simple, but there are several subtle points.

There is no guarantee that packets are received in order, so an RTCP BYE may be received before the last data packet from a source. To prevent state from being torn down and then immediately reestablished, a participant should be marked as having left after a BYE is received, and its state should be held over for a few seconds (my implementation uses a fixed two-second delay). The important point is that the delay is longer than both the maximum expected reordering and the media playout delay, thereby allowing for late packets and for any data in the playout buffer for that participant to be used.

Sources may be timed out if they haven't been heard from for more than five times the reporting interval. If the reporting interval is less than 5 seconds, the 5-second minimum is used here (even if a smaller interval is used when RTCP packets are being sent).

When a BYE packet is received or when a member times out, RTCP reverse reconsideration takes place, as described in the section titled BYE Reconsideration later in this chapter.

Категории