SCTP Tail Loss Recovery Enhancements
draft-nielsen-tsvwg-sctp-tlr-01.txt

Abstract

Loss Recovery by means of T3-Retransmission has significant detrimental impact on the delays experienced through an SCTP association. The throughput achievable over an SCTP association also is negatively impacted by the occurence of T3-Retransmissions. Loss Recovery by Fast Retransmission operation is in most situations superior to T3-Retransmission from a latency and a throughput perspective. The present SCTP Fast Recovery algorithms as specified by [RFC4960] are not able to adequately or timely recover losses in certain situations, thus resorting to loss recovery by lengthy T3-Retransimissions or by non-timely activation of Fast Recovery. In this document we propose for a number of enhancements to the SCTP Loss Recovery algorithms aimed to amend some of these deficiencies with a particular focus on Loss Recovery for drops in Traffic Tails. The enhancements supplement the existing algorithms of [RFC4960] with proactive probing and timer driven activation of the Fast Retransmission algorithm as well as a number of enhancements of the Fast Retransmission algorithm in itself are proposed. The enhancement are proposed as supplements to the Loss Recovery algorithms of [RFC4960] and as such they do not deprecate or replace any of the mechanisms defined by [RFC4960].

The solution proposed draws on prior art in the area of SCTP and TCP Loss Recovery improvements. The mechanisms proposed include the adjustment to SCTP Fast Retransmission of certain improvements specified for TCP Fast Retransmission by [RFC6675] as well as the proposal embeds SCTP Early Retransmit [RFC5827] in a delayed variant. The proposal heavily draws on the ideas put forward for TCP by [DUKKIPATI01] for proactive probing and timer driven entering of Fast Recovery. The proposal embeds certain aspects from [HURTIG] when applicable. The procedures proposed are sender-side only and do not impact the SCTP receiver.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on May 15, 2015.

Copyright Notice

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.

1. Introduction

1.1. SCTP TLR Function
1.2. TCP applicability
1.3. Packet Re-ordering
1.4. Congestion Control

2. Conventions and Terminology
3. Description of Algorithms

3.1. SCTP Scoreboard and Mis Indication Counting Enhancements

3.1.1. Highest TSN Newly Acknowledged Extension

3.2. RFC6675 nextseg() Tail Loss Enhancements for SCTP FR
3.3. SCTP-TLR Description

3.3.1. Principles
3.3.2. SCTP - TLR Statemachine
3.3.3. TLPP Transmission Rules
3.3.4. TLPP Recovered Losses

3.4. SCTP MH Considerations

4. Evaluation of function
5. Socket API Considerations
6. Security Considerations
7. Acknowledgements
8. IANA Considerations
9. References

9.1. Normative References
9.2. Informative References

Author's Address

1. Introduction

Loss Recovery by means of T3-Retransmission has significant impact on the delays experienced through, as well as, the throughput achievable over an SCTP association. Loss Recovery by Fast Retransmission (FR) operation in most situations is superior to T3-Retransmission from both a latency and a throughput perspective.

The present SCTP Fast Retransmission algorithm, as specified by [RFC4960], is driven uniquely by exceed of a duptresh number of mis indication counts stemming for returned SACKs, and it is as such not able to adequately or timely recover losses in traffic tails where a sufficient number of such SACKs may not be generated, there resorting to loss recovery by T3-Retransimissions or by "non-timely" activation of Fast Recovery.

By drop in traffic tails we refer not only to "pure" tail drops, i.e., drop of all packets in the end of the communication on an SCTP association from a certain point onwards, but more generally and specifically to the following situations:

Pure tail drops of the last SCTP packets of an SCTP association or more generally drop of packets in the end of an SCTP association which are not proceeded by more than dupthresh number of packets which are not dropped. Drops of either type we will generally refer to as Tail Drops.
Tails Drops among packets sent in a the end of bursts spaced by pauses of time equal to or greater than the T3-timeout (approximately). It is noted that such bursts (pauses in between bursts) may result from application limitations, from congestion control limitations or from receiver side limitations.
Drops among packets sent so sparsely that each dropped packet constitutes a tail drop in that dupthresh number of packets would not be sent (would not be available for sent) prior to expiry of the T3-timeout.

It shall be noted that while the above traffic drop criteria describe drops among the forward data packets only, then drops among forward data packets combined with drops of the returned SACKs may together result in that an insufficient number of SACKs be returned to traffic sender for that the Fast Retransmission algorithm be activated prior to T3-timeout occurring. The tail traffic situations for which SCTP FR is not able to recover the losses is thus in general broader than the exact situations listed above. The improvements proposed includes enhancement of SCTP to deduce the mis indication counts from an enhanced SACK scoreboard thus removing some of the vulnerability of the present SCTP mis indication counting to loss of SACKs.

It is noted that the Early Retransmit algorithm, [RFC5827], addresses activation of Fast Recovery for a particular subset of the above tail drop situations. The solution proposed embeds (as a special case) the Early Retransmits algorithm in the delayed variant, experienced with for TCP in [DUKKIPATI02] in which Early Retransmission is only activated provided a certain time has elapsed since the lowest outstanding TSN was transmitted. The delay adds robustness towards spurious retransmissions caused by "mild" packet re-ordering as documented for TCP in [DUKKIPATI02].

1.1. SCTP TLR Function

The function proposed for enhancements of the SCTP Loss Recovery operation for Traffic Tail Losses is divided in two parts:

Enhancements of SCTP Fast Retransmisison (SCTP FR) algorithm by means of the introduction SCTP FR equivalents of the following Tail Loss Recovery improving functions inspired by or specified by [RFC6675] for TCP.
- Counting mis indications for a missing (non-SACK'ed) TSN based on augmented SACK scoreboard information in which the mis indications will be based on the number of SACK'ed SCTP packets carrying data chunks of higher TSNs. The mechanism is specified both in terms of packets, the book-keeping of which requires new logic, as well as in terms of a less implementation demanding byte based variant following the Islost() approach of [RFC6675]. We shall refer to this as Extended Mis Indication Counting.
- The "last resort" retransmisssion, Nextseg 3) and Nextseg 4), operations of [RFC6675] supporting conditional proactive fast retransmissions of missing TSNs within the Fast Recovery Exit Point but not yet classified as lost
New SCTP Tail Loss Recovery State machine with proactive timer driven activation of (the enhanced) Fast Recovery operation whenever network responsiveness (SACKs of packets) has been proven within a certain time, shorter then the T3 timeout, from the transmittal of the lowest outstanding TSN. The SCTP TLR mechanism implements a new timer, the Tail Loss Probe timer (PTO), and it works in parts by:
- forcing entering of Fast Recovery when network responsiveness has been proven and the PTO timer has kicked, but additional trafic sent (SACKs of additional traffic sent) have not served to activate Fast Recovery based on the (extended) mis indication counting.
- probing, by transmittal of a TLR probe packet, for network responsiveness, when no other information is available at kick of the PTO timer (no packets have been received for any packets in the traffic tail).
- allowing for T3-retransmission Loss Recovery only when the network remains unresponsive (no SACK received for any traffic in the tail nor for the probe packet),

It is noted that depending on the exact situation (e.g., drop pattern, congestion window and amount of data in flight) then T3-retransmission procedures need not be inferior to Fast Retransmission procedures. Rather in some situations T3-retransmission will indeed be superior as T3-retransmissions allow for ramp up of the congestion window during the Recovery Process and as it, by its nature of declaring all outstanding data as lost, never risks being blocked by congestion window limitations. The changes proposed in this document focus on improving the Loss Recovery operation of SCTP by enforcing timely activation of (improved) Fast Retransmission algorithms. With the purpose to reduce the latency of the TCP and SCTP Loss Recovery operation [HURTIG] has taken the alternative approach of accelerating the activation of T3-retransmission processes when Fast Recovery is not able to kick in to recover the loss. [HURTIG] only addresses a subset of the Tail loss scenarios in scope in the work presented here. The ideas of [HURTIG] for accurate RTO restart are drawn on in the solution proposed here for accurate restart of the new tail loss probe timer (PTO-timer) as well as for accurate set of the T3-timer under certain conditions thus harvesting some og the same latency optimizations as [HURTIG].

OPEN ISSUE: It is to be determined if [HURTIG], or plain T3-retransmission of [RFC4960], are opportune compared to the solution proposed here in certain situations. Speculated situations include situations where the Fast Retransmission algorithm (when activated via new proactive approach) is blocked by congestion control (CC) limitations. If the issue is significant, the remedy may be to look for special purpose amendments, like to amend the CC operation during SCTP FR or to redesign the solution to promote proactive T3-retransmission operation rather than Fast Retransmission in certain situations. Yet another remedy may be to generally look to improve the CC operation of SCTP.

The SCTP TLR procedures proposed apply as add-on supplements to any SCTP implementation based on [RFC4960]. The procedures are sender-side only and do not impact the SCTP receiver.

1.2. TCP applicability

SCTP Loss Recovery operation in its core is based on the design of Loss Recovery for TCP with SACK enabled. The enhancements of SCTP Tail Loss Recovery proposed here are readably applicable for TCP.

It is noted that while the SCTP TLR algorithms and SCTP TLR state machine defined is inspired by the timer driven tail loss probe approach specified in [DUKKIPATI01] for TCP, then the solution defined here differs in the approach taken. The approach here is a clean state approach defining a new comprehensive SCTP TLR statemachine on top of (in addition to) the existing Fast Recovery and T3-Recovery states covering all tails loss patterns, whereas the approach of [DUKKIPATI01] relies on a number of experimental mechanisms ([DUKKIPATI02], [MATHIS], [RFC5827]) defined for TCP in IETF or in Research with adhoc extension to support selected Tail loss patterns by addition of the tail loss probe mechanism and the therefrom driven activation of the mechanisms.

1.3. Packet Re-ordering

The solution proposed is an enhancement of the existing mis indication counting based Fast Recovery operation of SCTP, [RFC4960], and as such the solution inherits the fundamental vulnerability to packet re-ordering that the SCTP Fast Recovery algorithm of [RFC4960] embeds.

The solution does not increase the vulnerability of Loss Recovery to packet-reordering as demonstrated by (to be filled in).

1.4. Congestion Control

It shall be noted that in its very nature of prompting for activation of Fast Recovery instead of T3-Recovery then the benefit of the solution proposed versus the existing solution of [RFC4960] will depend on the CC operation not only during the recovery process but also after exit of the recovery process. In this context it is noted that the prior approach taken for TCP, [DUKKIPATI01], has been documented for a TCP implementation running CUBIC, whereas SCTP runs a CC algorithm more similar to TCP Reno CC as defined by [RFC5681].

The solution at present is defined within the constraints of existing Congestion Control principles of STCP as defined by [RFC4960]. It is anticipated that Congestion Control improvements are desirable for SCTP in general as well as for the functions deined here in particular.

2. Conventions and Terminology

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].

3. Description of Algorithms

3.1. SCTP Scoreboard and Mis Indication Counting Enhancements

3.1.1. Highest TSN Newly Acknowledged Extension

Entering of Fast Recovery in SCTP, as specified by [RFC4960]), is driven by mis indication counts. When a TSN has received dupthresh=3 mis indication counts, the TSN is declared lost and will be eligible for fast retransmission via Fast Recovery procedure.

Mis indication counts are in RFC4960 SCTP driven entirely by receipt of SACKs in accordance with the Highest TSN Newly Acknowledged algorithm (section 7.2.4 of [RFC4960]):

Highest TSN Newly Acknowledged (HTNA): For each incoming SACK, miss indications are incremented only for missing TSNs prior to the highest TSN newly acknowledged in the SACK. A newly acknowledged DATA chunk is one not previously acknowledged in a SACK.

An evident issue with the HTNA algorithm is that it is vulnerable to loss of SACKs. In many situations loss of SACKs will result only in a slight delayed entering of Fast Recovery for a dropped TSN, but generally then by relying on HTNA algorithm only, loss of SACKs will further broaden the trafic tails situations where Fast Recovery either not be activated in a timely manner or will not be activated at all due to the receipt of an insufficient number SACKs only.

In order to make SCTP Fast Recovery more robust towards drop of SACKs we describe for the following extension of the HTNA algorithm to be supported by an SCTP implementation:

Newly Acked Packets ahead-of-line (NAPahol): For each incoming SACK, miss indications are incremented only for missing TSNs prior to the highest TSN newly acknowledged in the SACK. A newly acknowledged DATA chunk is one not previously acknowledged in a SACK. For each missing TSN thus potentially eligible for additional mis indication counts, the number of mis indications to be given shall follow the number of newly acknowledged packets ahead of line of the packet of the missing TSN.

The solution is robust towards split SACK. The solution requires for the SCTP impementation to keep track of the relationship inbetween chunks and packets. One solution is for the SCTP implementation to maintain a monotonically incrementing packet seqence number to map chunks to packets and for each outstanding chunk to keep state of the packet id that the chunk was sent in as well as (incrementally updated) the packet ids of up to dupthres-1 (=2) packets ahead of line for which chunks have been SACKed.

As an alternative to the above accurate packet counting then an SCTP implementation MAY instead support the following bytes counting based extension of the RFC4960 HTNA algorithm:

Highest Bytes Newly Acknowledged (HBNA): For each incoming SACK, miss indications are incremented only for missing TSNs prior to the highest TSN newly acknowledged in the SACK. A newly acknowledged DATA chunk is one not previously acknowledged in a SACK. For each missing TSN thus eligible for additional mis indication counts, the number of mis indications to be given shall follow the number of newly acknowledge bytes in the SACK ahead of line of the missing TSN in the following manner Add-mis-indication-count(TSN) = mod_PMTU(Newly bytes ahead of line(TSN))+1.

For both solutions (NAPhol, HBNA) then it is noted that an SCTP implementation only need to keep count of the mis-indications up to the dupthres=3 threshold level and equally well an implementation need not track the exact number of packets ahead of line or the exact number os bytes ahead of line of a certain missing TSN once this number surpasses the dupthres=3 threshold.

This last byte based approach follows the approach taken for TCP, Islost(), in [RFC6675]. It is noted, however, that due to the message based approach of SCTP, then a byte based approach generally will be less accurate as a measure for the number of packet received ahead of line than it is for byte stream based TCP.

OPEN ISSUE: Check alignment with algorthms defined in [HURTIG]. If relevant align.

3.2. RFC6675 nextseg() Tail Loss Enhancements for SCTP FR

The Fast Recovery algorithm for TCP as specified in [RFC6675] implements some differences compared to the fast retransmission algorithm specified for SCTP by [RFC4960]. Of particular significance for recovery of losses in traffic tails scenarios are the fact that the [RFC6675] algorithm, once Fast Recovery has been activated, takes two "last resort" retransmission measures, step 3) and step 4) of Nextseg() of [RFC6675], that faciliate the recovery of losses in situations where only an insufficient number of SACKs would be able to be generated to complete the Fast Recovery process without resorting to T3-timeout. For SCTP Fast Recovery we formulate the equivalent measures as follows:

Last Resort Retransmission:

If the following conditions are met:

there are no outstanding TSN's eligible for fast retransmission due to dupthres or more mis indications
there is no new data available for transmission

then an outstanding TSN less than or equal to the Fast Recovery Exit Point, for which there exists SACKs of chunks ahead of line of the TSN, may be retransmitted provided the CWND allow. The bytes of a TSN which is retransmitted in this manner are not subtracted from the flight size prior to this action be taken nor as a result of this action. If the mis indication count of the TSN subsequently reaches the dupthres value, the bytes of the TSN shall be subtracted from the flight size. Once acknowledged the remaining contribution of this TSN in the flight size (whether it be there counted once or twice at this point in time) is subtracted. A TSN which is retransmitted in this manner will be marked as ineligible for a subsequent fast retransmit.

Rescue:

If all of the following conditions are met:

there are no outstanding TSN's eligible for fast retransmission due to dupthres or more mis indications
there is no new data available for transmission
there are no outstanding TSNs eligible for Last Resort retransmission
the cumack has progressed since this entering of Fast Recovery

and there exist non-SACKed, non fast retransmitted TSNs, within the Fast Recovery Exit point, then for this entry of Fast Recovery, conditionally to that the CWND allows, we allow for fast retransmisssion of one packet of consecutive outstanding non fast retransmitted TSNs up to PMTU size, the highest TSN of which MUST be the highest outstanding TSN within the Fast Recovery Point. The bytes of a TSN which is retransmitted in this manner are not subtracted from the flight size prior to this action be taken nor as a result of this action. If the mis indication count of the TSN subsequently reaches the dupthres value, the bytes of the TSN shall be subtracted from the flight size. Once acknowledged the remaining contribution of this TSN in the flight size (whether it be there counted once or twice at this point in time) is subtracted. A TSN which is retransmitted in this manner will be marked as ineligible for a subsequent fast retransmit.

An implementation of the Rescue operation may be accomplished by maintain of an RescueRTX parameter as described for TCP in [RFC6675].

DISCUSSION: [RFC4960] in addition to the HTNA algorithm demand for additional mis indication counting to be performed during Fast Recovery according to the following prescription (section 7.2.4 of [RFC4960]):

(#): If an endpoint is in Fast Recovery and a SACK arrives that advances the Cumulative TSN Ack Point, the miss indications are incremented for all TSNs reported missing in the SACK.

It is noted that under special circumstances then (#) make SCTP Fast Recovery complete in situations where TCP Fast Recovery would only complete by virtue of the measure 3) or 4) of [RFC6675] and as such these measures are more critically demanded for TCP Fast Recovery operation than for the SCTP Fast Recovery operation. However as documented by (to be filled in) the Last Resort Retransmission operation and the Rescue operation also for SCTP significantly improve the Loss Recovery operation; the latency of the individual loss recovery operation as well as the ability of the operation to complete without resort to T3-timeout. Consequently this document prescribes for Enhanced SCTP Tail Loss Recovery to implement these procedures.

As the algoritm extension is limited by the existing congestion control algorithm of SCTP, these extensions of SCTP Fast Recovery do not compromize the TCP fairness of the SCTP Fast Recovery Operation.

3.3. SCTP-TLR Description

3.3.1. Principles

The Tail Loss Recovery function for SCTP is based on the following principles:

Maintain a Tail Loss Probe Timer (PTO) which, away from when SCTP is in Fast Recovery or in T3-recovery, is running on lowest outstanding TSN. The PTO timer value used will depend on the situation:
- By default the following timer value is used:
  PTO1:
  PTO=MIN(RTO, 1.5*SRTT+MAX(RTTVAR, DELAY_ACK))
- Whereas the following value is used:
  PTO2:
  PTO=MIN(RTO, 1.5*SRTT+RTTVAR)
  when it is known that subsequent SACKs not acknowledging the TSN for which the PTO is running will be (or will have been) returned immediately. For more details see
  Section 3.3.2.
- By design the probe timer is kept lower or equal to the RTO, thereby postponing a potential unnecessary and damaging RTO, as well as generally larger than an anticipated RTT thereby preventing that it kicks in prematurely. I.e., the timer only kicks in at a time where one would have expected to have received a SACK were there no problems.
PTO timer driven transmittal of Tail Loss Probe Packet: Once data is outstanding and the PTO timer kicks on the lowest outstanding TSN and no SACKs of any chunks with higher TSN number have arrived, a probe packet, denoted a Tail Loss Probe Packet (TLPP), is sent to probe for network responsiveness (i.e., for SACK of the TLPP) in order to potentially drive proactive entering of Fast Recovery.
- In this situation the PTO timer on the lowest outstanding TSN is cancelled and reset as a T3-timer with value MAX(PTO, RTO-PTO).
- The TLPP sent is chosen as the lowest unsent TSN if such exists and is available for transmittal or alternatively if no such TSN is available, the presently outstanding packet with highest TSN number. This is done in order to best possibly interface with standard Fast Recovery, i.e., to create a loss pattern situation that corresponds best possibly with how Fast Recovery algorithm retransmits lost packets.
PTO timer driven entering of Fast Recovery: Process is enforced when network responsiveness is proven (SACK of later sent data than lowest outstanding TSN is available) and (at least) PTO time has elapsed since transmittal of the lowest outstanding TSN.

3.3.2. SCTP - TLR Statemachine

In addition to the Fast Recovery State and the T3-Recovery state the SCTP Tail Loss Recovery function defines 3 states: The SCTP TLR OPEN state, the SCTP TLR PROBE WAIT state and the SCTP TLR DELAY WAIT state. At any given time SCTP transmission logic will be in either of the 5 states.

Figure 1 illustrates the states and the state transistions.

(to be inserted)

Figure 1, Enhanced Loss Recovery State Machine Diagram

In the following we describe the states and the actions taken.

3.3.2.1. SCTP TLR OPEN STATE

In this state SCTP is not performing Fast Recovery nor T3-recovery. This is the state entered when SCTP sends the first data after idle. In this state SCTP has outstanding data, a PTO timer is running on the lowest outstanding TSN and the SACK scoreboard has no gaps. I.e., the highest SACK'ed TSN is cummulatively acked.

The PTO set on a new lowest outstanding TSN in this state will follow [PTO1] when less than 2 packets are outstanding at the time when the timer is set and follow [PTO2] when 2 or more packets are outstanding when the PTO timer is set.

In this state the following may happen:

A SACK acknowledging a higher outstanding TSN than the lowest outstanding TSN may arrive thus proving network responsiveness while still not acknowledging the lowest outstanding TSN. This indicates that either packets are being re-ordered or the lowest outstanding TSN has been lost. The state will now transit to SCTP TLR DELAY WAIT state for potential entering of SCTP TLR driven Fast Recovery if the PTO timer kicks prior to the lowest outstanding TSN has been acknowledged.
The PTO set on a new lowest outstanding TSN in this state will follow [PTO1] when less than 2 packets are outstanding at the time when the timer is set and follow [PTO2] when 2 or more packets are outstanding when the PTO timer is set.
The PTO timer on the lowest outstanding TSN may kick, in which case SCTP TLP will send a TLPP, reset the PTO timer on the lowest outstanding TSN to a T3 timer of value Max(PTO, RTO-PTO) and transit to SCTP TLP PROBE WAIT state to await either the kick of the T3 on the lowest outstanding TSN (network is persistently unresponsive) or prove of network responsiveness and potential entering of SCTP TLP driven Fast Recovery unless the network responsiveness proof comes in form of cummulative acknowledgement of the TSN.

3.3.2.2. SCTP TLR DELAY PROBE STATE

In this state the lowest outstanding TSN has remained unSACK’ed for more than PTO time and no indication (no SACK of higher outstanding TSNs have been received) thus resulting in the transmittal of a TLPP to probe for the network responsiveness.

The MAX(PTO, RTO-PTO) T3-value set on the lowest outstanding TSN when sending the TLPP probe and entering this state shall be MAX(PTO1, (RTO-PTO)_previous), where the (RTO-PTO)_previous is set according to value of this at the time the PTO timer previously was set on the lowest outstanding TSN.

In this state then the following may happen:

A SACK cumulatively acknowledging all holes including the lowest outstanding TSN will bring the SCTP TLP STM state back to SCTP TLP Open state and the PTO timer will be restarted on the new lowest outstanding TSN.
A SACK will arrive for a higher outstanding TSN with lowest outstanding TSN remaining unSACK’ed. This will result in declaration of the lowest outstanding TSN as lost and will make SCTP enter Fast Recovery with exist point being set to the highest outstanding TSN as normal.
A SACK will arrive that acknowledge the lowest outstanding TSN, and the PTO timer is reset on the new lowest outstanding TSN, but also data of higher TSN than the new lowest outstanding TSN are acknowledged in the SACK. In this case there is indication that either packet re-ordering has occurred or the new lowest outstanding TSN has been lost. The state will now transit to SCTP TLP Delay Wait state for potential entering of SCTP TLP driven Fast Recovery if the PTO timer kicks prior to the new lowest outstanding TSN has acknowledged.

3.3.2.3. SCTP TLR DELAY WAIT STATE

In this state network responsiveness has been received (in form of a SACK of higher TSN than the lowest outstanding TSN) and the PTO timer on the lowest outstanding TSN is running for potential entering of SCTP TLP driven Fast Recovery.

The PTO set on a new lowest outstanding TSN in this state will be [PTO2].

In this state then the following may happen:

The PTO timer on the lowest outstanding TSN kicks. This will result in declaration of the lowest outstanding TSN as lost and will make SCTP enter Fast Recovery with exist point being set to the highest outstanding TSN as normal.
A SACK cumulatively acknowledging all holes including the lowest outstanding TSN will bring the SCTP TLP STM state back to SCTP TLP Open state and the PTO timer will be restarted on the new lowest outstanding TSN.
A SACK will arrive that acknowledge the lowest outstanding TSN, and the PTO timer is reset on the new lowest outstanding TSN, but also data of higher TSN than the new lowest outstanding TSN are acknowledged in the SACK. In this case there is indication that either packet re-ordering has occurred or the new lowest outstanding TSN has been lost. The state will remain in SCTP TLP Delay Wait state for potential entering of SCTP TLP driven Fast Recovery if the PTO timer kicks prior to the new lowest outstanding TSN has acknowledged.
A SACK will arrive that does not acknowledge the lowest outstanding TSN. In this situation the no state no changes are done to the PTO timer running and the state will remain in SCTP TLP Delay Wait state for potential entering of SCTP TLP driven Fast Recovery if the PTO timer kicks prior to the lowest outstanding TSN has acknowledged.

3.3.2.4. Exit of Loss Recovery

After exit of Fast Recovery or T3-Recovery then if data is outstanding a PTO timer is started on the lowest outstanding TSN and the state transits to either SCTP TLR OPEN state or to SCTP TLP DELAY Wait state depending on the status of the SACK scoreboard (i.e., do gaps exists or not). The PTO timer set will follow the rules described above.

3.3.3. TLPP Transmission Rules

The transmission of a Tail Loss Probe Packet (TLPP), done when entering the SCTP TLR PROBE DELAY WAIT state, is governed by the following details:

Section 3.3.4) is more simple when only one TSN has been used as a probe.

TLPP of new data is always preferred if available.
TLPP as new data is full-sized packet
TLPP of retransmission data is one TSN chunk. A TLPP of retransmission data counts twice in the in-flight until acknowledged.

The motivation for sending TLPP of retransmission in form of one chunk only is that demasking of loss recovery by the TLPP (see

TLPP Transmission conditions:

Section 3.3.4. The

If no TLPP is outstanding, a probe is sent unconditionally of CWND.
If a TLPP is outstanding, a probe is sent conditionally to flightsize < CWND + 1PMTU, otherwise no TLPP is sent.
If no new data exists, a probe of retransmission data is sent conditional to whether a TLPP is already outstanding. As follows:
- If no TLPP is outstanding, send TLPP consisting of highest outstanding TSN.
- If a TLPP is outstanding, then if and only if the probe is highest outstanding TSN may it be resent. Otherwise no TLPP is sent.

The above rules are defined to support detection of TLPP recovered losses by the algorithm described in

3.3.4. TLPP Recovered Losses

If a single SCTP packet is lost, there is a risk that the TLPP packet itself might repair the loss if that particular lost packet is used as probe. The masking problem is only present if the TLPP is based on retransmission data (i.e., not if the TLPP is based on new data). The TLPP might mask the loss and thus interfering with the congestion control principle that requires for CWND halving when a loss is detected.

At present the solution in this document operates with the algorithm defined for this purpose in [DUKKIPATI01] with a slight adjustment to SCTP to rely on the D-SACK (duplicate TSN received) information available from SCTP SACK. The solution operates with a conceptual TLPP Retransmission Episode. As follows:

Once a TLPP packet consisting of retransmission data is sent a TLPP Retransmission Episode is started. The TLPP Retransmission Episode is over when an incoming SACK cumulatively acknowledges a sequence number higher than the sequence number of the TLPP probe with retransmission data.
CWND halving is done at the termination of a TLPP Retransmisssion Episode if at this time in stage the number of times the TLPP TSN has been received, acccording to the D-SACK information communicated, is lower than the number of times the TLPP TSN has been sent.
A TLPP Retransmission Episode is abruptly terminated if Fast Recovery or T3-Recovery is entered.

OPEN ISSUE: The above solution is vulnerable to spurious CWND halving when a TLPP packet is re-ordered compared to a subsequent new data chunk sent. A possibly solution, contemplated for a number of reasons for SCTP, is to extend SCTP to distinguish retransmitted chunks from original chunks.

3.4. SCTP MH Considerations

The functions defined have been implemented for SCTP MH. MH aspects to be filled in.

4. Evaluation of function

Experiments in progress. Details to be filled in.

5. Socket API Considerations

This section will describe how the socket API defined in [RFC6458] is extended to provide a way for the application to control the retransmission algorithms in operation in the SCTP layer.

Socket option for control of the features is yet to be defined.

Please note that this section is informational only.

6. Security Considerations

There are no new security considerations introduced by the functions defined in this document.

7. Acknowledgements

The author acknowlegdes Henrik Jensen for his very significant contribution for the definition of, the implementation of and the experiments with function.

The work heavily draws on prior art work done for TCP, [DUKKIPATI01] in particular. The contributors of that work should be credited for many of the ideas put forward here for SCTP.

8. IANA Considerations

This document does not create any new registries or modify the rules for any existing registries managed by IANA.

9. References

9.1. Normative References

[RFC2119]	Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC4960]	Stewart, R., "Stream Control Transmission Protocol", RFC 4960, September 2007.
[RFC5062]	Stewart, R., Tuexen, M. and G. Camarillo, "Security Attacks Found Against the Stream Control Transmission Protocol (SCTP) and Current Countermeasures", RFC 5062, September 2007.
[RFC6675]	Blanton, E., Allman, M., Wang, L., Jarvinen, I., Kojo, M. and Y. Nishida, "A Conservative Loss Recovery Algorithm Based on Selective Acknowledgment (SACK) for TCP", RFC 6675, August 2012.

9.2. Informative References

[DUKKIPATI01]	Dukkipati, N., Cardwell, N., Cheng, Y. and M. Mathis, "Tail Loss Probe (TLP): An Algorithm for Fast Recovery of Tail", Work Expired , 2 2013.
[DUKKIPATI02]	Dukkipati, N., Mathis, M., Cheng, Y. and M. Ghobadi, "Proportional Rate Reduction for TCP", Proceedings of the 11th ACM SIGCOMM Conference on Internet Measurement , 11 2011.
[HURTIG]	Hurtig et al., P., "TCP and SCTP RTO Restart, draft-ietf-tcpm-rtorestart-03", IETF Work In Progress , 7 2014.
[MATHIS]	Mathis, M., "FACK", ACM SIGCOMM Computer Communication Review 26,4, 10 1996.
[RFC5681]	Allman, M., Paxson, V. and E. Blanton, "TCP Congestion Control", RFC 5681, September 2009.
[RFC5827]	Allman, M., Avrachenkov, K., Ayesta, U., Blanton, J. and P. Hurtig, "Early Retransmit for TCP and Stream Control Transmission Protocol (SCTP)", RFC 5827, May 2010.
[RFC6458]	Stewart, R., Tuexen, M., Poon, K., Lei, P. and V. Yasevich, "Sockets API Extensions for the Stream Control Transmission Protocol (SCTP)", RFC 6458, December 2011.

Author's Address

Karen E. E. Nielsen Ericsson Kistavaegen 25 Stockholm, 164 80 Sweden EMail: karen.nielsen@tieto.com