avt G. Franceschini Internet Draft Telecom Italia Lab Expires: August 2007 February 13, 2007 Bandwidth Metrics draft-franceschini-avt-bwmetrics-00.txt Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html This Internet-Draft will expire on August 13, 2007. Franceschini Expires August 13, 2007 [Page 1] Internet-Draft Bandwidth Metrics February 2007 Abstract When delivering audio and video data through low capacity links, it is important to optimally exploit the limited resources in order to provide the best possible user experience. QoS aspects in this respect might relate to the pure audio and video quality, as well as to delay and lip-sync. The actual weights of the different QoS aspects depend on the service being offered, and are out of the scope of this document. In order to optimally exploit the limited resources, it is necessary to get a reasonable precise measurement of them. This document proposes metrics to address this problem. This document, in this initial draft, focuses only on the semantic aspects, leaving out the syntactical details. Conventions used in this document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119 [1]. Franceschini Expires August 13, 2007 [Page 2] Internet-Draft Bandwidth Metrics February 2007 Table of Contents 1. Introduction...................................................4 2. The starting point.............................................5 2.1. Use case: peer-to-peer audio/visual narrowband communication ...............................................................5 2.2. The proposed metrics......................................5 2.3. Examples of metrics values................................7 2.3.1. PSTN stack including RTP/UDP/IP/PPP/LAPM, with 40 Kbit/s of physical capacity........................................7 2.3.2. ATM stack including RTP/UDP/IP/PPP/ETH+FCS/LLC- SNAP/AAL5/ATM, with 128 Kbit/s of physical capacity.........9 2.4. Sender decisions affecting QoS...........................10 2.5. Computation of the optimal transmission parameters.......11 2.5.1. Parameters..........................................11 2.5.2. Computation of MaxVSize.............................12 2.5.3. Computation of MaxPTime.............................12 2.5.4. Computation of VideoBW..............................14 2.6. Examples of use..........................................15 3. Other issues..................................................18 3.1. Narrowband and wideband communication....................18 3.2. Session level and media level parameters.................18 3.3. Point-to-point and point-to-multipoint...................19 3.4. TIDC or MPO updates......................................19 3.5. RTCP support.............................................20 3.6. Middle boxes.............................................20 4. Security Considerations.......................................21 5. IANA Considerations...........................................21 6. Conclusions...................................................21 7. Acknowledgments...............................................21 8. References....................................................22 8.1. Normative References.....................................22 8.2. Informative References...................................22 Author's Addresses...............................................22 Intellectual Property Statement..................................22 Disclaimer of Validity...........................................23 Copyright Statement..............................................23 Acknowledgment...................................................23 Franceschini Expires August 13, 2007 [Page 3] Internet-Draft Bandwidth Metrics February 2007 1. Introduction In a path from A to B, there are often one or more bottlenecks that limit the bandwidth available for the audio and/or video data. Bottlenecks can, in principle, be present in any link, but most reasonably they are present at the access links of A and/or B. When this occurs, knowledge about the characteristics of such bottleneck(s) is essential to: i) prevent congestion; ii) exploit the available resources at their maximum capacity. The requirement of defining metrics that precisely characterize the bottleneck capacity applies to both the initial negotiation phase and to the case of updates. It is well known that the simple indication of a bandwidth value (such as the CT or AS bandwidth parameters) provides quite vague information, since the protocol stack overhead remains unknown, and might greatly affect the overall bandwidth consumption. As a consequence, a compromise between the requirements (i) and (ii) above is chosen in a conservative way, that sacrifices (ii) to make (i) almost guaranteed. Sometimes this compromise results in an excess of unnecessary sacrifice for (ii). The indication of a couple of values (one being some form of bandwidth, and one being some indication of overhead) looks definitely more attractive, and has the potential of reducing the vagueness in the information, thus allowing for a less severe sacrifice in (ii). Also, since links might not necessarily present the same characteristics in Uplink and Downlink, the information shall relate to a specific direction. This document presents in section 2 a first use-case (peer-to-peer audio/visual narrowband communication), proposes new bandwidth metrics and shows how this use-case would benefit from the signaling of such metrics. Subsections 2.1 through 2.3 are the most relevant, while 2.4 through 2.6 are dedicated to those really intrigued in the algorithm. More sceanrios and issues are then considered in section 3, along with proposed solutions that both address variations to the original proposal of section 2, and highlights potential further areas of work. Franceschini Expires August 13, 2007 [Page 4] Internet-Draft Bandwidth Metrics February 2007 2. The starting point 2.1. Use case: peer-to-peer audio/visual narrowband communication The whole work presented in this document derives from the analysis, implementation and deployment experience conducted on a well-defined use case: peer-to-peer audio/visual narrowband communication. For such a scenario the basic requirement was very simple: optimize the user experience! 2.2. The proposed metrics When audio/video data is delivered through an IP Network, a number of protocol layers are involved, resulting in a significant overhead. The protocol stack typically involves, from the application layer down to the physical medium: o the Real-time Transport Protocol (RTP); o the User Datagram Protocol (UDP); o the Internet Protocol (IP); o a number of different protocol stacks featuring Data Link and Physical Layer functionalities and that depend on the specific network itself. The overhead induced by the RTP/UDP/IP protocol layers might, in some case, differ significantly from the normal 40 bytes. E.g., IP Tunneling could be employed, that implies a sort of "double" IP layer, and in essence an extra overhead for each packet. Another technique used in some context is the compression of the RTP/UDP/IP headers. Other changes in the overhead might derive from the usage of IP-SEC or of IPv6, and possibly from other variations in the stack currently not envisaged. The computation of the overhead induced by the protocol layers below the IP level is even more complex. On some networks, data is physically streamed according to a framing mechanism totally decoupled from the IP packetization; in other cases, the IP packets boundaries are preserved, but adapted with extra padding to fit the physical frames. The overhead due to the various protocol layers can be in general modeled as: Franceschini Expires August 13, 2007 [Page 5] Internet-Draft Bandwidth Metrics February 2007 o per packet overhead, when the protocol layer takes care of the packet boundaries of the upper layer, and adds a certain overhead on each such data packet: the bigger the data packet is, the lower the overhead percentage results (examples are RTP, UDP, IP, as well as ETH, PPP) o per byte overhead, when the protocol layer ignores the packet boundaries of the upper layer, and manages the data traffic as a stream of bytes to which it adds a certain mean overhead; e.g.: o the stream of bytes is segmented in frames of a certain length, each having a header and/or trailer (e.g.: LAP-M); o the stream of bytes is coded according to some rule, e.g. escape bytes are inserted to avoid emulating certain sequences, whose occurrence can be statistically determined (e.g.: again LAP-M); all such examples can be modeled as a fixed overhead percentage; o stepwise overhead, when the protocol layer takes care of the packet boundaries of the upper layer, but encapsulates the upper layer data in frames of fixed length, adding padding bytes to fill the last frame assigned to the upper layer packet: this is the case of e.g. ATM-AAL5, where an upper layer data packet is segmented to fit the payload space of an integral number of ATM cells, with padding in the last cell of 0 to 47 bytes; this is the most difficult case to model, since the difference of 1 byte in the packet length at the application layer might result in a full additional ATM cell (53 bytes). In practice, it sounds reasonable in such a case to compute a statistically equivalent per-byte overhead. Thus, in general, the computation of the overhead induced by the various protocol layers on an audio or video flow is not trivial, and definitely depends on how big the packets generated at the application layer are, and how these packets are managed through the different protocol layers. Since the overhead depends on the protocol stack, it can potentially change on each link of the delivery path. The most important overhead is clearly the overhead associated to the bottleneck link. In a path from A to B, assuming that the two potential bottlenecks are the links close to peer A and peer B themselves, the sender (A) needs information about the characteristics of the local UplinkA and of the remote DownlinkB. Once such characterization is available at Franceschini Expires August 13, 2007 [Page 6] Internet-Draft Bandwidth Metrics February 2007 sender A, it might identify the best compromises between e2e delay and overhead, and determine the values for the maxptime, maxvsize and videobw parameters defined above: in essence, sender A will be able to make its own best decision on how to (i) prevent congestion and at the same time (ii) exploit the available resources at their maximum capacity. The proposed metrics measure the bandwidth available at a certain protocol stack layer and the corresponding mean per-packet overhead: o TIDC/TIUC (Transport Independent Downlink/Uplink Capacity): represents the capacity (in Kbit/s) of the link (Downstream/Upstream direction) measured at a certain protocol stack layer. o MPO (Mean Packet Overhead): represents the mean overhead that applies to each packet from the layer above RTP down to the protocol stack layer at which the corresponding TIxC value has been measured. The overhead excludes any bytes in excess with respect to the canonical 12 bytes RTP header. Thus RTP Header extensions are not included, and neither are the Payload Format overheads. 2.3. Examples of metrics values 2.3.1. PSTN stack including RTP/UDP/IP/PPP/LAPM, with 40 Kbit/s of physical capacity. The contribution of the various layers to the per-packet overhead is: RTP (12), UDP (8), IP (20), PPP (8); in addition the LAPM provides a per-byte overhead of about 10% (this figure depends on some parameter of the LAPM itself, but for the example purposes let's assume 10%). This means that 40 - 10% = 36 Kbit/s are left to the PPP layer. Also, the total per-packet overhead at the PPP layer is 12+8+20+8 = 48 bytes For such a case an optimal tuple would therefore be: TIDC(PPP) = 36 MPOD(PPP) = 48 The above tuple refers to the PPP layer. If the IP layer were selected instead, the MPO figure would clearly decrease to 40. However, it is not obvious how to compute the TIDC for the IP layer, since the PPP layer provides a per-packet overhead, not a per-byte overhead. Thus, an hypothesis on the packet rate is required, say 50 Franceschini Expires August 13, 2007 [Page 7] Internet-Draft Bandwidth Metrics February 2007 packets/second. This translates into 3200 bit/sec of overhead, i.e. ~3 Kbit/s. The tuple therefore becomes: TIDC(IP,50) = 33 MPOD(IP) = 40 This tuple relative to the IP layer however implies as well a packet rate of 50 packets/sec, whereas the tuple relative to the PPP layer had no assumption on the packet rate. Using the same approach and hypothesis of 50 packets/sec, the tuples relative to the UDP, RTP and application layer can be computed as well: TIDC(UDP,50) = 25 MPOD(UDP) = 20 TIDC(RTP,50) = 22 MPOD(RTP) = 12 TIDC(APP,50) = 17 MPOD(APP) = 0 All five tuples are equivalent as far as the hypothesis of 50 packets/sec is correct. But the more the reality diverges from the hypothesis, the more the APP, RTP, UDP and IP tuples provide inaccurate information, while the PPP tuple remains correct, since it is independent on the packet rate assumption. In other words, if the measurement is made at the interface between the lowest protocol stack layer providing per-packet overhead and the highest protocol stack layer providing per-byte overhead, that measurement is accurate under all packet rates. Otherwise, it is only precise when the packet rate hypothesis well approximates the reality. Indeed, if the packet rate hypothesis changes into 25 packets/sec the new figures become: TIDC(IP,25) = 34 MPOD(IP) = 40 TIDC(UDP,25) = 30 MPOD(UDP) = 20 TIDC(RTP,25) = 29 MPOD(RTP) = 12 TIDC(APP,25) = 26 Franceschini Expires August 13, 2007 [Page 8] Internet-Draft Bandwidth Metrics February 2007 MPOD(APP) = 0 2.3.2. ATM stack including RTP/UDP/IP/PPP/ETH+FCS/LLC-SNAP/AAL5/ATM, with 128 Kbit/s of physical capacity. The contribution of the various layers to the per-packet overhead is: RTP (12), UDP (8), IP (20), PPP (8), ETH+FCS (18), LLC/SNAP (10), AAL5 (8); in addition the ATM layer provides a fixed per-byte overhead of about 10% (5 header bytes for every 48 payload bytes) plus a statistical per-byte overhead of, say, another 10% (due to the padding of 0-47 bytes in the last cell belonging to an AAL5 PDU). This means that 128 - 20% = 102 Kbit/s are left to the AAL5 layer. Also, the total per-packet overhead at the AAL5 layer is 12+8+20+8+18+10+8 = 84 bytes For such a case an optimal tuple would therefore be: TIDC(AAL5) = 102 MPOD(AAL5) = 84 As in the previous example, here as well it is possible to compute alternate tuples based on an hypothesis on the packet rate. Say, again, 50 packet/sec. We can then obtain: TIDC(IP,50) = 84 MPOD(IP) = 40 TIDC(UDP,50) = 76 MPOD(UDP) = 20 TIDC(RTP,50) = 73 MPOD(RTP) = 12 TIDC(APP,50) = 68 MPOD(APP) = 0 While for 25 packets/sec the figures become: TIDC(IP,25) = 93 MPOD(IP) = 40 TIDC(UDP,25) = 89 MPOD(UDP) = 20 TIDC(RTP,25) = 88 MPOD(RTP) = 12 Franceschini Expires August 13, 2007 [Page 9] Internet-Draft Bandwidth Metrics February 2007 TIDC(APP,25) = 85 MPOD(APP) = 0 2.4. Sender decisions affecting QoS For the peer-to-peer audio/visual narrowband communication scenario, three encoding and transmission parameters have been identified as crucial for the overall QoS, and mostly determined by a sender-only decision: o maxptime: maximum length (in ms) of the audio content transmitted in a single audio RTP packet; o maxvsize: maximum length (in bytes) of a video RTP packet; o videobw: bandwidth assigned to the video stream at the media level. These parameters affect the QoS because: maxptime: small values imply low acquisition, serialization and rendering delays, thus an overall low e2e delay. On the other hand, the generation of many small RTP packets determines a significantly high transmission overhead that erodes the bandwidth available for the video stream. This might be less evident when RTP Header compression techniques are in place. In general, the sender shall identify an acceptable compromise between e2e delay and overhead, although there are cases where the value is quite constrained. maxvsize: small values allow for a fine interleaving between audio and video packets, thus keep low the amount of jitter induced on the audio delivery by the serialization of the video RTP packets. Such a jitter directly contributes to the audio e2e delay. On the other hand, the generation of many small RTP packets determines a significantly high transmission overhead that erodes the bandwidth available for the video stream. Here again the sender shall identify an acceptable compromise between audio e2e delay and overhead. There is of course an upper-bound constraint that is determined by the path MTU-SIZE. videobw: small values allow for a conservative approach in which the path capacity is under-utilized, therefore preventing congestion in case of transient problems (e.g. retransmissions). On the other hand, the video quality is directly proportional to the video bandwidth. The sender shall identify the highest possible value that still keeps the risk of congestion acceptable. Franceschini Expires August 13, 2007 [Page 10] Internet-Draft Bandwidth Metrics February 2007 Of course many other parameters could be considered, such as the audio bandwidth itself (e.g.: AMR coding can range from 4.75 up to 12.2 Kbit/s), but it is believed that the three parameters above are the ones that deserve the main attention. 2.5. Computation of the optimal transmission parameters 2.5.1. Parameters Here is a possible algorithm for computing the optimal transmission parameters. Here a number of constraints are set: this is an example, and the mechanism by which the sender is made aware of the values of such constraints is outside the scope of this document. The parameters to compute are: o MaxPTime: maximum length (in ms) of the audio content transmitted in a single audio RTP packet; o MaxVSize: maximum length (in bytes) of a video RTP packet; o VideoBW: bandwidth assigned to the video stream at the media level. The parameters providing the link characterization are: o TIDC: the remote Downlink capacity in bit/s o MPOD: the remote Downlink mean packet overhead in Bytes o TIUC: the local Uplink capacity in bit/s o MPOU: the local Uplink mean packet overhead in Bytes In the following they will be often referred to as TIxC and MPOx, to mean either remote Downlink or local Uplink. Furthermore, the following constraints are set: o MTUSIZE: maximum IP packet length; o MAX_JITTER_INT: maximum interleaving jitter (in ms), computed as the difference between the minimal and maximum delay of an audio packet due to the interleaving of video packets; o MIN_VIDEO_BW: the minimum bandwidth to dedicate to the video stream at the media level; o AUDIO_BW: the bandwidth to dedicate to the audio stream at the media level; Franceschini Expires August 13, 2007 [Page 11] Internet-Draft Bandwidth Metrics February 2007 o MAX_AUDIOLENMS: the maximum value for MaxPTime. o MIN_AUDIOLENMS: the minimum value for MaxPTime. o AUDIOFRAMELENMS: the audio frame length. 2.5.2. Computation of MaxVSize This parameter shall assume the highest possible value (to reduce overhead) coherently with the following constraints: o Shall be small enough to avoid causing an interleaving jitter greater than MAX_JITTER_INT; o Shall be small enough to avoid causing IP fragmentation (coherently with MTUSIZE). The minimum delay of an audio packet induced by the interleaving of video packets is clearly 0. The interleaving jitter thus equates the maximum delay of an audio packet induced by the interleaving of video packets. This in turn can be reasonably computed as the serialization time of the longest video packet across the two known links (UplinkA and DownlinkB). Thus (all terms in bytes, ms and bits/ms): MAX_JITTER_INT = ((MaxVSize+MPOD)*8/TIDC + (MaxVSize+MPOU)*8/TIUC) Assuming that MAX_JITTER_INT is known (constraint), and that MPOD, TIDC, MPOU e TIUC are known as well, it is possible to obtain MaxVSize: MaxVSize = [(MAX_JITTER_INT * TIDC * TIUC / 8) - (MPOD * TIUC) - (MPOU * TIDC)] / (TIUC + TIDC) If no audio is involved, MaxVSize does not require -of course- any such calculation. MaxVSize shall than respect the IP requirement of not causing any IP fragmentation, and (notwithstanding all possible variations of the RTP/UDP/IP stack) a typical formula would just be: MaxVSize <= MTUSIZE - 40 2.5.3. Computation of MaxPTime This parameter shall assume the lowest possible value (to reduce e2e delay) coherently with the following constraints: o It shall correspond to an integral multiple of the AUDIOFRAMELENMS (e.g. Nx10 ms for G.729, or Nx20 ms for AMR) Franceschini Expires August 13, 2007 [Page 12] Internet-Draft Bandwidth Metrics February 2007 o Shall not be bigger than MAX_AUDIOLENMS o Shall not be smaller than MAX_AUDIOLENMS o Shall be big enough to not erode excessive bandwidth to the video payload, which is requested to reach at least MIN_VIDEO_BW; The available bandwidth (TIUC or TIDC) can be assigned to audio and video payload and overhead as in the formula below: TIxC >= BW_V + BW_A + OVx_V + OVx_A Where: o BW_V is the pure video bandwidth, which is required to be > MIN_VIDEO_BW o BW_A is the pure audio bandwidth, which is known to be AUDIO_BW o OVx_V and OVx_A are the bandwidths consumed by the overheads for audio and video By 'x' is meant either 'D' (remote Downlink) or 'U'(local Uplink). The OVx_V can be approximated, being known the MaxVSize, as a function of BW_V (all terms in bytes, ms and bits/ms): OVx_V =