IPFIX Working Group A. Akhter Internet-Draft Cisco Systems Intended status: Standards Track October 17, 2010 Expires: April 20, 2011 Information Elements for Flow Performance Measurement draft-akhter-ipfix-perfmon-00.txt Abstract There is a need to be able to quantify and report the performance of network applications and the network service in handling user data. This performance data provides information essential in validating service level agreements, fault isolation as well as early warnings of greater problems. This document describes IPFIX Information Elements related to performance measurement of network based applications. In addition, to the performance information several non-metric information elements are also included to provide greater context to the reports. The measurements use audio/video applications as a base but are not restricted to these class of applications. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on April 20, 2011. Copyright Notice Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of Akhter Expires April 20, 2011 [Page 1] Internet-Draft IPFIX PerfMon October 2010 publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 3. General Usage . . . . . . . . . . . . . . . . . . . . . . . . 5 3.1. Quality of Service (QoS) Monitoring . . . . . . . . . . . 5 3.2. Service Level Agreemnt (SLA) Validation . . . . . . . . . 5 3.3. Fault Isolation and Troubleshooting . . . . . . . . . . . 6 4. New Information Elements . . . . . . . . . . . . . . . . . . . 6 4.1. Transport Layer . . . . . . . . . . . . . . . . . . . . . 6 4.1.1. perfPacketLoss . . . . . . . . . . . . . . . . . . . . 6 4.1.2. perfPacketExpected . . . . . . . . . . . . . . . . . . 8 4.1.3. perfPacketLossRate . . . . . . . . . . . . . . . . . . 9 4.2. User and Application Layer . . . . . . . . . . . . . . . . 10 4.2.1. perfSessionSetupDelay . . . . . . . . . . . . . . . . 10 4.3. Contextual Elements . . . . . . . . . . . . . . . . . . . 11 4.3.1. mediaRTPSSRC . . . . . . . . . . . . . . . . . . . . . 11 4.3.2. mediaRTPPayloadType . . . . . . . . . . . . . . . . . 12 5. Security Considerations . . . . . . . . . . . . . . . . . . . 13 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 14 7.1. Normative References . . . . . . . . . . . . . . . . . . . 14 7.2. Informative References . . . . . . . . . . . . . . . . . . 14 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 16 Akhter Expires April 20, 2011 [Page 2] Internet-Draft IPFIX PerfMon October 2010 1. Introduction Today's networks support a multitude of highly demanding and sensitive network applications. Network issues are readily apparent by the users of these applications due to the sensitivity of these applications to impaired network conditions. Examples of these network applications include applications making use of IP based audio, video, database transactions, virtual desktop interface (VDI), online gaming, cloud services and many more. In some cases the impaired application translates directly to loss of revenue. In other cases, there may be regulatory or contractual service level agreements that motivate the network operator. Due to the sensitive of these types of applications to impaired service it leaves a poor impression of the service on the user-- regardless of the actual performance of the network itself. In the case of an actual problem within the network service, monitoring the performance may yield a early indicator of a much more serious problem. Due to the demanding and sensitive nature of these applications, network operators have tried to engineer their networks in an attempt to wring better and differentiated performance. However, that same differentiated design prevents network operators from extrapolating observational data from one application to another, or from one set of synthetic (active test) test traffic to actual application performance. Performance measurements on user data provide greater visibility not only into the quality of experience of the end users but also visibility into network health. With regards to network health, as flow performance is being measured, there will be visibility into the end to end performance which means that not only visibility into local network health, but also viability into remote network health. If these measurements are made at multiple points within the network (or between the network and end device) then there is not only identification that there might be an issue, but a span of area can be established where the issue might be. The resolution of the fault increases with the number of measurement points along the flow path. The IP Flow Information Export Protocol (IPFIX) [RFC5101] provides new levels of flexibility in reporting from measurement points across the life cycle of a network based application. IPFIX can provide granular results in terms of flow specificity as well as time granularity. At the same time, IPFIX allows for summarization of data along different types of boundaries for operators that are unconcerned about specific sessions but about health of a service or a portion of the network. Where possible, an attempt has been made to make use of existing Akhter Expires April 20, 2011 [Page 3] Internet-Draft IPFIX PerfMon October 2010 definitions of metrics ([RFC4710]) and if needed, clarify and expand on them to widen their usage with additional applications. As this document also covers the reporting of these metrics via IPFIX, consideration is taken with mapping the metric's capabilities and context with the IPFIX information and data representation model. 2. Terminology Terms used in this document that are defined in the Terminology section of the IPFIX Protocol [RFC5101] document are to be interpreted as defined there. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. In addition, the information element definitions use the following terms: Name: Name of the information element per the IPFIX rules defined in Section 2.3 of [RFC5102] Description: Short description of what the information element is trying to convey. Observation Point: Where the measurement is meant to be performed. Either at an intermediate point (for example, a router) or end system. Element Data Type: The IPFIX informationElementDataTypeas defined in Section 3.1 of [RFC5610] Element Semantics: The IPFIX informationElementSemantics as defined in section Section 3.6 of [RFC5610] Element Units: The IPFIX informationElementUnits as defined in section Section 3.7 of [RFC5610] Element Range Begin: The IPFIX informationElementRangeBegin as defined in section Section 3.7 of [RFC5610] Element Range End: The IPFIX informationElementRangeEnd as defined in section Section 3.7 of [RFC5610] Akhter Expires April 20, 2011 [Page 4] Internet-Draft IPFIX PerfMon October 2010 Element Id: The IPFIX global unique element ID as defind in Section 3.2 of [RFC5101] Status: The status of the specification of this IPFIX Information Element. Use and Applications An explanation of how this particular information element would be used. Calculation Method: In the case of metrics, this section describes how the metric is calculated, as well as any special conditions. Units of Measurement: In the case of metrics, what are the units of measurement. The text here is expected to be wider and more descriptive than in the IPFIX Element Units section. Measurement Timing: Discussion on the acceptable range of timing and sampling intervals. 3. General Usage 3.1. Quality of Service (QoS) Monitoring The network operator needs to be able to gauge the end user's satisfaction with the network service. While there are many components of the satifaction such as pricing, packaging, offering, etc., a major compoenent of satisfaction is delivering a consistant service. The user builds trust on this consistancy of the network service and is then to be able to run network applications-- which is of course the the end goal. Without the ability to deliver a consistant service for end user network applications network operator will be left dealing with price sensitive disgruntled users with very low expectations (if they don't have choice of operator) or abandonment (if they have choice). 3.2. Service Level Agreemnt (SLA) Validation Similar to QoS and QoE validation, there might be contractual or regulatory requirements that need to be met by the network operator. Monitoring the performance of the flows allows the application operator, network operator as well as the end user to validate of the target service is being deliverd. While there is quite a diverstiy in the codification of network SLAs they may eventually involve some measurement of network uptime, end to end latency, end to end jitter and perhaps service response time. In the case violation of the SLA, the start and end times, nature and network scope of the violation needs to be captured to allow for the most accurate settling of the Akhter Expires April 20, 2011 [Page 5] Internet-Draft IPFIX PerfMon October 2010 SLA. 3.3. Fault Isolation and Troubleshooting It has been generally easier to troubelshoot and fix problems that are binary in nature: it either works or does not work. The host is pingable or not pingable. However, the much more difficult to resolve issues that are transitory in nature, move from location to location, more complicated that simple ICMP reachability and many times unverifiable reports by the users themselves. It is these intermittant and seemingly inconsistant network impairments that performance metrics can be extremely helpful with. Just the basic timely detection that there is a problem (or an impending problem) can give the providor the confidence that there is a real problem that needs to be resolved. The next step woudl be to assist the operator in a speedy resolution by providing information regarding the network locaton and nature of the problem. 4. New Information Elements The information elements are organized into two main groups: Transport Layer: Metrics that might be calculated from observations at higher layers but essentially provide information about the network transport of user date. For example, the metrics related to packet loss, latency and jitter would be defined here. User and Application Layer: Metrics that are might be affected by the network indirectly, but are ultimately related to user, end- system and session states. For example, session setup time, transaction rate and session duration would be defined here. Contextual Elements Information elements that provide further context to the metrics. For example, media type, codec type, and type of application would be defined here. 4.1. Transport Layer 4.1.1. perfPacketLoss Name: perfPacketLoss Description: The packet loss metric reports the number of individual packets that were lost in the reporting interval. Akhter Expires April 20, 2011 [Page 6] Internet-Draft IPFIX PerfMon October 2010 Observation Point: The observation can be made anywhere along the media path or on the endpoints them selves. The observation is only relevant in a unidirectional sense. Element Data Type: unsigned32 Element Semantics: deltaCounter Element Units: packets Element Range Begin: 0 Element Range End: 0xFFFFFFFE Element Id: TBDperfPacketLoss Status: current Use and Applications The packet loss metric can be used to determine if there is a network impairment that is causing packet loss upstream of the measurement point. When there are observation points on either side of the impairment location it is possible to locate the impairment. With the location information teh operator can is able to perform quicker fault-isolation as well as shorten time to resolution. Calculation Method: This metric requires that each IP packet be individually marked with a monotonically incrementing sequence number. A number of encapsulations support this type of sequencing: IPSec ESP [RFC4303], GRE [RFC2890] and RTP [RFC3550]. And an analyis of the sequence number field can yeild the lost number of packets. In certain cases, there might be an element of discovery and synchronization of the flow it self before the measurement can be made. An example of this can be found for RTP flows running on ephemeral UDP port numbers. In these cases, reporting 0 as packet loss would be misleading and the value 0xFFFFFFFF must be used in cases where the packet loss value can not be determined. In the case of a monitor interval where synchronization was acheived mid-interval, the loss packet counter MAY be used to represent remainder of the interval. As this metric is a deltaCounter the number of loss packets only represent the observation within the reporting interval. Due to the dependency on the arrival of a packet with a sequcnce number to calculate loss, the loss calculation may be indefenitely delayed if no more packets arrive at all. For the case of RTP, in addition to the 16 bit sequence number field in RFC3550, there is also the additional 16-bit high-order sequence number field (for a total of 32-bit seq number space) that is used in RFC3497 Akhter Expires April 20, 2011 [Page 7] Internet-Draft IPFIX PerfMon October 2010 [RFC3497]. RFC3497 traffic runs at a very high rate and the 32- bit field allow for additional time for wrapping (21 seconds). So, a loss span of greater than 21 seconds measured only by the 16-bit field will lead to inaccurate reporting. In the case of secure RTP [RFC3711], the relevant portion of the RTP header is in the clear and lost packet counting can still be performed. It is important to note that the sequence number space is unique per RTP SSRC. Therefore it is important to track the high sequence number seen on a per SSRC-5-tuple basis. There may be multiple SSRCs in a single 5-tuple. Certain applications inject non-RTP traffic into the same 5-tuple as the media stream. RTCP packets may be seen in the same 5-tuple as the RTP stream [RFC5761], and STUN [RFC3489] packets may also be seen. The loss detection should ignore these packets. Units of Measurement: packets Measurment Timing To be able to calculate this metric a continous set of the flow's packets (as each woudl have an incrementing sequence number) needs to be monitored. Therefore, per-packet sampling would prevent this metric from being calculated. However, there are other sampling methodoligies that might be usabel. It is possible to generate sampled metrics by sampling spans of continous packets, however a portion of the span may have to be utilized for resynchronization of the sequence number. Another form of acceptable sampling would be at the flow level. 4.1.2. perfPacketExpected Name: perfPacketExpected Description: The number of packets there were expected within a monitoring interval. Observation Point: The observation can be made anywhere along the media path or on the endpoints them selves. The observation is only relevant in a unidirectional sense. Element Data Type: unsigned32 Element Semantics: deltaCounter Element Units: none Element Range Begin: 0 Akhter Expires April 20, 2011 [Page 8] Internet-Draft IPFIX PerfMon October 2010 Element Range End: 0xFFFFFFFE Element Id: TBDperfPacketExpected Status: current Use and Applications The perfPacketExpected is a mid-calculation metric used in the calculation of perfPacketLossRate. Calculation Method: The subtraction of the last sequence number from the first sequence numebr in monitoirng interval yeilds the expected count. As discussed with perfPacketLost, there might be a delay due to synchronization with the flows's sequnce numbers and in such times the value of the metric should be set to 0xFFFFFFFE. Care has to be taken to account for cases where the packet's sequence number field wraps. For RTP the expected count calculation formula can be found in Appendix A.3 of [RFC3550] Units of Measurement: packets Measurment Timing Same considerations as perfPacketLoss 4.1.3. perfPacketLossRate Name: perfPacketLossRate Description: Percentage of number of packets lost out of the total set of packets sent. Observation Point: The observation can be made anywhere along the media path or on the endpoints them selves. The observation is only relevant in a unidirectional sense. Element Data Type: unsigned16 Element Semantics: quantity Element Units: none Element Range Begin: 0 Element Range End: 0xFFFE Element Id: TBDperfPacketLossRate Akhter Expires April 20, 2011 [Page 9] Internet-Draft IPFIX PerfMon October 2010 Status: current Use and Applications The perfPacketLossRate metric can be used to normalize the perfPacketLoss metric to handle cases where different flows are running at different packet per second (pps) rates. Due to the normalization, comparisons can now be made against thresholds (for creating alerts, etc.). In addition, the percentage form of the metric allows for comparisons against other flows at the same observation point to determine if there is an equal bias for drops between the flows. Otherwise, the perfoPacketLossRate is used in same way as perfPacketLoss. Calculation Method: The number of lost packets divided by the number of expected packets in an interval period multiplied by 100. In cases where perfPacketLoss is unknown (for example due to synchronization issues), the perfPacketLossRate woudl also be unknown. In such cases perfPacketLossRate MUST be set to 0xFFFF. If there are multiple flows whose loss rate is being aggregated, then the average of the individual flows is used. Units of Measurement: percentage Measurment Timing Same notes as perfPacketLossRate 4.2. User and Application Layer 4.2.1. perfSessionSetupDelay Name: perfSessionSetupDelay Description: The Session Setup Delay metric reports the time taken from a request being initiated by a host/endpoint to the response (or request indicator) to the request being observed. This metric is defined in [RFC4710], however the units have been updated to microseconds. Observation Point: This metric needs to be calculated where both request and response can be observed. This could be at network choke points, application proxies, or within the end systems themselves. Element Data Type: unsigned32 Element Semantics: quantity Akhter Expires April 20, 2011 [Page 10] Internet-Draft IPFIX PerfMon October 2010 Element Units: microseconds Element Range Begin: 0 Element Range End: 0xFFFFFFFE Element Id: TBDperfSessionSetupDelay Status: current Use and Applications The session setup delay metric can measure the end user initial wait experience as seen from the network transaction level. The value will not only include the network flight time, but also includes the server response time and may be used to alert the operator in cases where the overall service is overloaded and thus sluggish, or within normal operating values. Calculation Method: Measure distance in time between the first bit of request and the first bit of the response. For the case of SIP, please see Section 4.3.1 of [I-D.ietf-pmol-sip-perf-metrics] Units of Measurement: microseconds Measurment Timing This measurement can be sampled on a session by session basis. It may be advisable to set sample targets on a per source range - to destination basis. Due to the nature of measurement intervals, there may be a period of time (and thus measurement reports) in which the perfSessionSetupDelay value has not been calculated. In these cases the value 0xFFFFFFFE MUST be used and can be interpreted to mean not applicable. For measurement intervals after perfSessionSetupDelay has been calculated and the existing calculated perfSessionSetupDelay value SHOULD be sent if reporting only on that single session. However, if multiple sessions are summarized in the report then the average for perfSessionSetupDelay values calculated in the most recent interval SHOULD be used. The intention with this behavior is to acknowledge that the value has not bee calculated, and when it has provide the freshest values available. 4.3. Contextual Elements 4.3.1. mediaRTPSSRC Name: mediaRTPSSRC Akhter Expires April 20, 2011 [Page 11] Internet-Draft IPFIX PerfMon October 2010 Description: Value of the syncronization source (SSRC) field in the RTP header of the flow. This field is defined in [RFC3550] Observation Point: This metric can be gleaned from the RTP packets directly, so the observation point needs to on the flow path or within the endpoints. Element Data Type: unsigned32 Element Semantics: identifier Element Units: octets Element Range Begin: 0 Element Range End: 0xFFFFFFFE Element Id: TBDmediaRTPSSRC Status: current Use and Applications The RTP SSRC value denotes a specific media stream. As such when trying to differentiate media stream problems between session participants the SSRC field is needed. Calculation Method: Copy from RTP header's SSRC field as defined in [RFC3550]. In the case of a non-RTP flow, or the time period in which the flow has not been verfied to be a RTP flow the value 0xFFFFFFFE MUST be reported. Units of Measurement: identifer Measurment Timing It is possible that the SSRC may have be renegotiated mid-session due to collisions with other RTP senders. 4.3.2. mediaRTPPayloadType Name: mediaRTPPayloadType Description: The value of the RTP Payload Type Field as seen in the RTP header of the flow. This field is defind in [RFC3550] Observation Point: This metric can be gleaned from the RTP packets directly, so the observation point needs to on the flow path or within the endpoints. Akhter Expires April 20, 2011 [Page 12] Internet-Draft IPFIX PerfMon October 2010 Element Data Type: unsigned16 Element Semantics: identifier Element Units: octets Element Range Begin: 0 Element Range End: 0xFF Element Id: TBDmediaRTPPayloadType Status: current Use and Applications The RTP PT conveys the payload format and media encoding used in the RTP payload. For simple cases, where the RTP PT is from the statically defiend range this can lead to an understanding of type of media codec used. With the knowledge of the codec being used the degree of media impairment (given loss values and jitter) can be estimated better. However, for more recent codecs, the RTP dynamic range is used. In these cases the RTP payload values are dynamically negotiated. In the case of a non-RTP flow, or the time period in which the flow has not been verfied to be a RTP flow, the value 0xFFFF MUST be reported. Calculation Method: Copy from RTP header's RTP-PT field as defined in [RFC3550] Units of Measurement: identifer Measurment Timing 5. Security Considerations The recommendations in this document do not introduce any additional security issues to those already mentioned in [RFC5101] and [RFC5477] 6. IANA Considerations This document requires an elements assignment to be made by IANA. 7. References Akhter Expires April 20, 2011 [Page 13] Internet-Draft IPFIX PerfMon October 2010 7.1. Normative References [RFC5101] Claise, B., "Specification of the IP Flow Information Export (IPFIX) Protocol for the Exchange of IP Traffic Flow Information", RFC 5101, January 2008. [RFC5610] Boschi, E., Trammell, B., Mark, L., and T. Zseby, "Exporting Type Information for IP Flow Information Export (IPFIX) Information Elements", RFC 5610, July 2009. [RFC4710] Siddiqui, A., Romascanu, D., and E. Golovinsky, "Real-time Application Quality-of-Service Monitoring (RAQMON) Framework", RFC 4710, October 2006. [RFC5102] Quittek, J., Bryant, S., Claise, B., Aitken, P., and J. Meyer, "Information Model for IP Flow Information Export", RFC 5102, January 2008. [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, July 2003. [RFC3497] Gharai, L., Perkins, C., Goncher, G., and A. Mankin, "RTP Payload Format for Society of Motion Picture and Television Engineers (SMPTE) 292M Video", RFC 3497, March 2003. [RFC3489] Rosenberg, J., Weinberger, J., Huitema, C., and R. Mahy, "STUN - Simple Traversal of User Datagram Protocol (UDP) Through Network Address Translators (NATs)", RFC 3489, March 2003. [I-D.ietf-pmol-sip-perf-metrics] Malas, D. and A. Morton, "Basic Telephony SIP End-to-End Performance Metrics", draft-ietf-pmol-sip-perf-metrics-07 (work in progress), September 2010. [iana-ipfix-assignments] Internet Assigned Numbers Authority, "IP Flow Information Export Information Elements (http://www.iana.org/assignments/ipfix/ipfix.xml)". 7.2. Informative References [I-D.ietf-pmol-metrics-framework] Clark, A. and B. Claise, "Guidelines for Considering New Performance Metric Development", draft-ietf-pmol-metrics-framework-05 (work in progress), Akhter Expires April 20, 2011 [Page 14] Internet-Draft IPFIX PerfMon October 2010 October 2010. [I-D.trammell-ipfix-ie-doctors] Trammell, B. and B. Claise, "Guidelines for Authors and Reviewers of IPFIX Information Elements", draft-trammell-ipfix-ie-doctors-00 (work in progress), October 2010. [RFC2508] Casner, S. and V. Jacobson, "Compressing IP/UDP/RTP Headers for Low-Speed Serial Links", RFC 2508, February 1999. [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC 3711, March 2004. [RFC2250] Hoffman, D., Fernando, G., Goyal, V., and M. Civanlar, "RTP Payload Format for MPEG1/MPEG2 Video", RFC 2250, January 1998. [RFC2890] Dommety, G., "Key and Sequence Number Extensions to GRE", RFC 2890, September 2000. [RFC4303] Kent, S., "IP Encapsulating Security Payload (ESP)", RFC 4303, December 2005. [RFC5761] Perkins, C. and M. Westerlund, "Multiplexing RTP Data and Control Packets on a Single Port", RFC 5761, April 2010. [I-D.huici-ipfix-sipfix] Huici, F., Niccolini, S., and S. Anderson, "SIPFIX: Use Cases and Problem Statement for VoIP Monitoring and Exporting", draft-huici-ipfix-sipfix-00 (work in progress), June 2009. [nProbe] "probe - NetFlow/IPFIX Network Probe (http://www.ntop.org/nProbe.html)". [RFC2321] Bressen, A., "RITA -- The Reliable Internetwork Troubleshooting Agent", RFC 2321, April 1998. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC5477] Dietz, T., Claise, B., Aitken, P., Dressler, F., and G. Carle, "Information Model for Packet Sampling Exports", RFC 5477, March 2009. Akhter Expires April 20, 2011 [Page 15] Internet-Draft IPFIX PerfMon October 2010 [VoIP-monitor] L. Chang-Yong, H. Kim, K. Ko, J. Jim, and H. Jeong, "A VoIP Traffic Monitoring System based on NetFlow v9, International Journal of Advanced Science and Technology, vol. 4, Mar. 2009". Author's Address Aamer Akhter Cisco Systems, Inc. 7025 Kit Creek Road RTP, NC 27709 USA Email: aakhter@cisco.com Akhter Expires April 20, 2011 [Page 16]