SCTP Tail Loss Recovery Enhancements
draft-nielsen-tsvwg-sctp-tlr-00.txt

Abstract

Loss Recovery by means of T3-Retransmission has significant detrimental impact on the delays experienced through an SCTP association. The throughput achievable over an SCTP association also is negatively impacted by the occurence of T3-Retransmissions. Loss Recovery by Fast Retransmission operation is in most situations superior to T3-Retransmission from a latency and a throughput perspective. The present SCTP Fast Recovery algorithms as specified by [RFC4960] are not able to adequately or timely recover losses in certain situations, thus resorting to loss recovery by lengthy T3-Retransimissions or by non-timely activation of Fast Recovery. In this document we propose for a number of enhancements to the SCTP Loss Recovery algorithms aimed to amend some of these deficiencies with a particular focus on Loss Recovery for drops in Traffic Tails. The enhancements supplements the existing algorithms of [RFC4960] with proactive probing and timer driven accelerated activation of the Fast Retransmission algorithm as well as a number of enhancements of the Fast Retransmission algorithm in itself are proposed. The enhancement are proposed as supplements to the Loss Recovery algorithms of [RFC4960] and as such they do not deprecate or replace any of the mechanisms defined by [RFC4960].

The solution proposed draws on prior art in the area of SCTP and TCP Loss Recovery improvements. The mechanisms proposed include the adjustment to SCTP Fast Retransmission of certain improvements specified for TCP Fast Retransmission by [RFC6675] as well as the proposal embeds SCTP Early Retransmit [RFC5827] in a delayed variant. The proposal heavily draws on the ideas put forward for TCP by [DUKKIPATI01] for proactive probing and timer driven entering of Fast Recovery procedures. The proposal embeds certain aspects from [HURTIG] when applicable. The procedures proposed are sender-side only and do not impact the SCTP receiver.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on April 30, 2015.

Copyright Notice

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.

1. Introduction

1.1. SCTP TLR Function
1.2. TCP applicability
1.3. Packet Re-ordering

2. Conventions and Terminology
3. Description of Algorithms

3.1. SCTP Scoreboard and mis indication counting enhancements
3.2. RFC6675 nextseg() tail loss enhancements for SCTP FR
3.3. SCTP-TLR Description

4. Evaluation of function
5. Socket API Considerations
6. Security Considerations
7. Acknowledgements
8. IANA Considerations
9. References

9.1. Normative References
9.2. Informative References

Author's Address

1. Introduction

Loss Recovery by means of T3-Retransmission has significant impact on the delays experienced through, as well as, the throughput achievable over an SCTP association. Loss Recovery by Fast Retransmission (FR) operation in most situations is superior to T3-Retransmission from both a latency and a throughput perspective.

The present SCTP Fast Retransmission algorithm, as specified by [RFC4960], is driven uniquely by exceed of a duptresh number of mis indication counts stemming for returned SACKs (the contents of which must fulfill certain conditions, for details the reader is referred to [RFC4960]), and it is as such not able to adequately or timely recover losses in traffic tails where a sufficient number of such SACKs may not be generated, there resorting to loss recovery by T3-Retransimissions or by "non-timely" activation of Fast Recovery.

By drop in traffic tails we refer not only to "pure" tail drops, i.e., drop of all packets in the end of the communication on an SCTP association from a certain point onwards, but more generally and specifically to the following situations:

Pure tail drops of the last SCTP packets of an SCTP association or more generally drop of packets in the end of an SCTP association which are not proceeded by more than dupthresh number of packets which are not dropped. Drops of either type we will generally refer to as Tail Drops.
Tails Drops among packets sent in a the end of bursts spaced by pauses of time equal to or greater than the T3-timeout (approximately). It is noted that such bursts (pauses in between bursts) may result from application limitations, from congestion control limitations or from receiver side limitations.
Drops among packets sent so sparsely that each dropped packet constitutes a tail drop in that dupthresh number of packets would not be sent (would not be available for sent) prior to expiry of the T3-timeout.

It shall be noted that while the above traffic drop criteria describe drops among the forward data packets only, then drops among forward data packets combined with drops of the returned SACKs may together result in that an insufficient number of SACKs be returned to traffic sender for that the Fast Retransmission algorithm be activated prior to T3-timeout occurring. The tail traffic situations for which SCTP FR is not able to recover the losses is thus in general broader than the exact situations listed above. The improvements proposed includes enhancement of SCTP to deduce the mis indication counts from an enhanced SACK scoreboard thus removing some of the vulnerability of the present SCTP mis indication counting to loss of SACKs.

It is noted that the Early Retransmit algorithm, [RFC5827], addresses activation of Fast Recovery for a particular subset of the above tail drop situations. The solution proposed embeds (as a special case) the Early Retransmits algorithm in the delayed variant, experienced with for TCP in [DUKKIPATI02] in which Early Retransmission is only activated provided a certain time has elapsed since the lowest outstanding TSN was transmitted. The delay adds robustness towards spurious retransmissions caused by "mild" packet re-ordering as documented for TCP in [DUKKIPATI02].

1.1. SCTP TLR Function

The function proposed for enhancements of the SCTP Loss Recovery operation for Traffic Tail Losses is divided in two parts:

Enhancements of SCTP Fast Retransmisison (SCTP FR) algorithm by means of the introduction SCTP FR equivalents of the following Tail Loss Recovery improving functions inspired by or specified by [RFC6675] for TCP.
- Counting mis indications for a missing (non-SACK'ed) TSN based on augmented SACK scoreboard information in which the mis indications will be based on the number of SACK'ed SCTP packets carrying data chunks of higher TSNs. The mechanism is specified both in terms of packets, the book-keeping of which requires new logic, as well as in terms of a less implementation demanding byte based variant following the Islost() approach of [RFC6675].
- Nextseg 3) and Nextseg 4) Rescue Operation of [RFC6675] supporting conditional proactive fast retransmissions of missing TSNs within the Fast Recovery Exit Point but not yet classified as lost
New SCTP Tail Loss Recovery State machine with proactive timer driven activation of (the improved) Fast Recovery operation whenever network responsiveness (SACKs of packets) has been proven within a certain time, shorter then the T3 timeout, from the transmittal of the lowest outstanding TSN. The SCTP TLR mechanism implements a new (shorter than RTO) timer, the Tail Loss Recovery timer (TLRTO), and it works in parts by:
- forcing entering of Fast Recovery when network responsiveness has been proven and the TLRTO timer has kicked, but additional trafic sent (SACKs of additional traffic sent) have not served to activate Fast Recovery based on the dupthresh driven mis indications
- probing, by transmittal of a TLR probe packet, for network responsiveness when no other information is available at kick of the TLRTO timer (no packets have been received for any packets in the traffix tail).
- allowing for T3-retransmission Loss Recovery only when the network remains unresponsive (no SACK received for any trafficc in the tail nor for the probe packet),

It is noted that depending on the exact situation (e.g., drop pattern, congestion window and amount of data in flight) then T3-retransmission procedures need not be inferior to Fast Retransmission procedures. Rather in some situations T3-retransmission will indeed be superior as T3-retransmissions allow for ramp up of the congestion window during the Recovery Process and as it, by its nature of declaring all outstanding data as lost, never risks being blocked by congestion window limitations. The changes proposed in this document focus on improving the Loss Recovery operation of SCTP by enforcing timely activation of (improved) Fast Retransmission algorithms. With the purpose to reduce the latency of the TCP and SCTP Loss Recovery operation [HURTIG] has taken the alternative approach of accelerating the activation of T3-retransmission processes when Fast Recovery is not able to kick in to recover the loss. [HURTIG] only addresses a subset of the Tail loss scenarios in scope in the work presented here. The ideas of [HURTIG] for accurate RTO restart are drawn on in the solution proposed here for accurate restart of the new tail loss recovery timer (TLRTO-timer) as well as for accurate set of the T3-timer under certain conditions thus harvesting some og the same latency optimizations as [HURTIG].

OPEN ISSUE: It is to be determined if [HURTIG], or plain T3-retransmission of [RFC4960], are opportune compared to the solution proposed here in certain situations. Speculated situations include situations where the Fast Retransmission algorithm (when activated via new proactive approach) is blocked by congestion control limitations. If the issue is significant, the remedy may be either to look to amend the CC operation during SCTP FR or to look to redesign the solution proposed here to promote proactive T3-retransmisisonn operation rather than Fast Retransmission.

Finally it shall be noted that in its very nature of prompting for activation of Fast Recovery instead of T3-Recovery then the benefit of the solution proposed versus the the existing solution of [RFC4960] will depend on the CC operation not only during the recovery process but also after exit of the recovery process. In this context it is noted that the prior approach taken for TCP, [DUKKIPATI01], has assumed run of CUBIC after Exit of Fast Recovery, whereas SCTP runs a CC algorithm more similar to TCP CC as defined by [RFC5681].

The SCTP TLR procedures proposed apply as add-on supplements to any SCTP implementation based on [RFC4960]. The procedures are sender-side only and do not impact the SCTP receiver.

1.2. TCP applicability

SCTP Loss Recovery operation in its core is based on the design of Loss Recovery for TCP with SACK enabled. The enhancements of SCTP Tail Loss Recovery proposed here are readably applicable for TCP.

It is noted that while the SCTP TLR algorithms and SCTP TLR state machine defined here is inspired by the timer driven tail loss probe approach specified in [DUKKIPATI01] for TCP, then the solution defined here differs in the approach taken. The approach here is a clean state approach defining a new comprehensive SCTP TLR statemachine on top of (in addition to) the existing Fast Recovery and T3-Recovery states covering all tails loss patterns, whereas the approach of [DUKKIPATI01] relies on a number of various experimental mechanisms ([DUKKIPATI02], [MATHIS], [RFC5827]) defined for TCP in IETF or in Research with adhoc extension to support selected Tail loss patterns by addition of the tail loss probe mechanism and the therefrom driven activation of the mentioned mechanisms.

1.3. Packet Re-ordering

The solution is an enhancement of the existing dupthresh based Fast Recovery operation of SCTP, [RFC4960], and as such the solution inherits the fundamental vulnerability to packet re-ordering that the SCTP Fast Recovery algorithm of [RFC4960] embeds.

The solution does not increase the vulnerability of Loss Recovery to packet-reordering as demonstrated by (to be filled in).

2. Conventions and Terminology

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].

3. Description of Algorithms

Missing. To be filled in for version 01. For refererence see http://www.ietf.org/proceedings/90/slides/slides-90-tsvwg-16.pdf.

3.1. SCTP Scoreboard and mis indication counting enhancements

3.2. RFC6675 nextseg() tail loss enhancements for SCTP FR

3.3. SCTP-TLR Description

4. Evaluation of function

5. Socket API Considerations

This section describes how the socket API defined in [RFC6458] is extended to provide a way for the application to control the retransmission algorithms in operation in the SCTP layer.

Socket option for control of the features is yet to be defined.

Please note that this section is informational only.

6. Security Considerations

There are no new security considerations introduced by the functions defined in this document.

7. Acknowledgements

The author wish to express her gratitude towards Henrik Jensen for his invaluable and indispensable contribution for the definition of, the implementation of and the experiments with function.

8. IANA Considerations

This document does not create any new registries or modify the rules for any existing registries managed by IANA.

9. References

9.1. Normative References

[RFC2119]	Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC4960]	Stewart, R., "Stream Control Transmission Protocol", RFC 4960, September 2007.
[RFC5062]	Stewart, R., Tuexen, M. and G. Camarillo, "Security Attacks Found Against the Stream Control Transmission Protocol (SCTP) and Current Countermeasures", RFC 5062, September 2007.
[RFC6675]	Blanton, E., Allman, M., Wang, L., Jarvinen, I., Kojo, M. and Y. Nishida, "A Conservative Loss Recovery Algorithm Based on Selective Acknowledgment (SACK) for TCP", RFC 6675, August 2012.

9.2. Informative References

[DUKKIPATI01]	Dukkipati, N., Cardwell, N., Cheng, Y. and M. Mathis, "Tail Loss Probe (TLP): An Algorithm for Fast Recovery of Tail", Work Expired , 2 2013.
[DUKKIPATI02]	Dukkipati, N., Mathis, M., Cheng, Y. and M. Ghobadi, "Proportional Rate Reduction for TCP", Proceedings of the 11th ACM SIGCOMM Conference on Internet Measurement , 11 2011.
[HURTIG]	Hurtig et al., P., "TCP and SCTP RTO Restart, draft-ietf-tcpm-rtorestart-03", IETF Work In Progress , 7 2014.
[MATHIS]	Mathis, M., "FACK", ACM SIGCOMM Computer Communication Review 26,4, 10 1996.
[RFC5681]	Allman, M., Paxson, V. and E. Blanton, "TCP Congestion Control", RFC 5681, September 2009.
[RFC5827]	Allman, M., Avrachenkov, K., Ayesta, U., Blanton, J. and P. Hurtig, "Early Retransmit for TCP and Stream Control Transmission Protocol (SCTP)", RFC 5827, May 2010.
[RFC6458]	Stewart, R., Tuexen, M., Poon, K., Lei, P. and V. Yasevich, "Sockets API Extensions for the Stream Control Transmission Protocol (SCTP)", RFC 6458, December 2011.

Author's Address

Karen E. E. Nielsen Ericsson Kistavaegen 25 Stockholm, 164 80 Sweden EMail: karen.nielsen@tieto.com