Internet-Draft Fault Management in MPTCP August 2020
Kang & Liang Expires 10 February 2021 [Page]
Workgroup:
TCP Maintenance and Minor Extensions
Internet-Draft:
draft-kang-tcpm-fault-management-in-mptcp-session-00
Published:
Intended Status:
Informational
Expires:
Authors:
J. Kang, Ed.
Huawei
Q. Liang
Huawei

Fault Management Mechanism in MPTCP Session

Abstract

This document presents a mechanism for fault management during a MPTCP session. It is used to convey subflow failure information from client to server by other subflow running normally. It includes: 1) a new Fault Announce Option for describing subflow failure, 2) implementation and interoperability of this option during a MPTCP session when one subflow suffers a failure. In fact, the server is able to determine network problems accurately based on these fault information reported from multiple clients for their connections.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 10 February 2021.

Table of Contents

1. Introduction

During data transmission in a MPTCP session, subflows may encounter some problems, for example, port failure on one endpoint, network failure, or middlebox working abnormally. Current MPTCP protocol does not provide exchanges between client and server when a fault happens on a subflow which will cause transmission failure or delay.

[RFC8684] introduces TCP RST Reason (MP_TCPRST) option to signal reasons for sending a RST on a subflow which can help an implementation decide whether to attempt later reconnection. TCP RST Reason (MP_TCPRST) option only reports the reason for a specific subflow that has been determined to be closed later. This solution does not cover the case of abnormal termination of one ongoing subflow.

1.1. Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].

2. Fault Announce Exchanges

This document proposes a fault announce mechanism with a new option that can be used to deliver failure information of abnormal subflow between client and server via another subflow in the MPTCP session that works properly. The flow is illustrated in Figure 1.

       +--------+                               +--------+
       | Client |                               | Server |
       +--------+                               +--------+
           |                                         |
           |<---MPTCP Session setup with subflows--->|
           |                                         |
Determine that one ongoing subflow                   |
      is faulty                                      |
           |                                         |
           |-------Send Fault Announce option------->|
           |     indicating suflow failure via       |
           |           another subflow               |
           |                                         |
           |                                         |
           |                                         |
Figure 1: Client sends Fault Announce to server during a MPTCP Session

The Fault Announce option is carried on SYN, ACK or data packets.

Client may detect a local fault, for example, local port or network card failure, or an error in local protocol processing. In this way, the client can determine the fault cause.

Client may actively detect subflow failure by a detecting task to determine the fault cause. For example, the client may deploy a detection task using a bidirectional forwarding detection (BFD) to determine whether the subflow is faulty.

Client may send an ICMP request to server and determine the exceptions by the duration of a response. Specifically, if the client cannot receive a response within a preset time, it means that this subflow is not working properly.

Another way for client to determine the fault reason is ICMP error report. Client may receive an ICMP error report from a third device (e.g., middlebox on the faulty subflow), in which indicates the fault cause.

3. Fault Announce option

A new Fault Announce option is defined to describe the fault in detail occurring on one subflow. If it is set, the faulty subflow is identified by its source address ID (SrcAddressID) and destination address ID (SrcAddressID). The mapping between IP addresses and addresses IDs should be created on both client and server through the process of ADD_ADDR defined in [RFC8684] and [RFC6824].

3.1. Option format

The format of the Fault Announce option (FAULT_ANNOUNCE) is depicted in Figure 2:

                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+---------------+-------+-------+---------------+
|     Kind      |    Length     |Subtype| (rsv) |     Cause     |
+---------------+---------------+-------+-------+---------------+
| DestAddressID |  SrcAddressID |                               |
+---------------+---------------+-------------------------------+

Figure 2: Fault Announce (FAULT_ANNOUNCE) Option

A new subtype should be allocated to indicate Fault Announce option.

"Cause" is an 8-bit field to describe the reason code for which causes the subflow to malfunction. Client detects the fault and determines the cause. Following values (partially mapped to the Exception Code in ICMP error report) are defined in this document:

"SrcAddressID" is used to identify source address ID for the faulty subflow.

"DestAddressID" is used to identify destination address ID for the faulty subflow.

3.2. Additional requirements to be considered

3.2.1. Scenario of middlebox failure

In some actual scenarios, it is the middlebox failure that causes blocking of one subflow. So client should report to server the information of the faulty middlebox by Fault Announce option so that the server can quickly locate it. The information of a faulty middlebox may include:

Middlebox IP: The IP address of the faulty middlebox.

IP protocol version: The IP protocol version adopted by the faulty middlebox, i.e. IPv4 or IPv6. Server can use it to parse the field of "Middlebox IP address".

Flag 'A': If "Middlebox IP address" is optional, this flag should be defined to indicate whether the field of "Middlebox IP address" is carried in Fault Announce option.

3.2.2. Scenario of distinguishing fault types

In some possible implementations, faults are classified into transient fault and non-transitory fault. So a field of "fault type" may be added to identify the type (transient fault or non-transitory fault) for subsequent processing.

4. IANA Considerations

IANA is requested to assign a MPTCP option subtype for the Fault Announce option.

5. Security Considerations

Fault Announce option is neither encrypted nor authenticated, so on-path attackers and middleboxes could remove, add or modify this option on observed Multipath TCP connections.

6. References

6.1. Normative References

[RFC0793]
Postel, J., "Transmission Control Protocol", STD 7, RFC 793, DOI 10.17487/RFC0793, , <https://www.rfc-editor.org/info/rfc793>.
[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/info/rfc2119>.
[RFC6824]
Ford, A., Raiciu, C., Handley, M., and O. Bonaventure, "TCP Extensions for Multipath Operation with Multiple Addresses", RFC 6824, DOI 10.17487/RFC6824, , <https://www.rfc-editor.org/info/rfc6824>.
[RFC8684]
Ford, A., Raiciu, C., Handley, M., Bonaventure, O., and C. Paasch, "TCP Extensions for Multipath Operation with Multiple Addresses", RFC 8684, DOI 10.17487/RFC8684, , <https://www.rfc-editor.org/info/rfc8684>.

Authors' Addresses

Jiao Kang (editor)
Huawei
D2-03,Huawei Industrial Base
Shenzhen
China
Qiandeng Liang
Huawei
No. 207, Jiufeng 3rd Road, East Lake High-tech Development Zone
Wuhan
China