BFD Stability
O3b Networks
mishra.ashesh@outlook.com
Cisco Systems
170 W. Tasman Drive
San Jose
CA
95134
USA
mjethanandani@gmail.com
www.cisco.com
Ciena Corporation
3939 North 1st Street
San Jose
CA
95134
USA
ankurpsaxena@gmail.com
www.ciena.com
Juniper Networks
Juniper Networks, Exora Business Park
Bangalore
Karnataka
560103
India
santoshpk@juniper.net
Huawei
mach.chen@huawei.com
China Mobile
32 Xuanwumen West Street
Beijing
Beijing
China
fanp08@gmail.com
Network
Routing Working Group
Internet-Draft
This document describes extensions to the Bidirectional Forwarding
Detection (BFD) protocol to measure BFD stability. Specifically, it
describes a mechanism for detection of BFD frame loss.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119.
The Bidirectional Forwarding Detection (BFD) protocol operates by transmitting and
receiving control frames, generally at high frequency, over the datapath
being monitored. In order to prevent significant data loss due to a
datapath failure, the tolerance for lost or delayed frames in the
Detection Time, as defined in BFD is set
to the smallest feasible value.
This document proposes a mechanism to detect lost frames in a BFD
session in addition to the datapath fault detection mechanisms of BFD.
Such a mechanism presents significant value to measure the stability of
BFD sessions and provides data to the operators for the cause of a BFD
failure.
This document does not propose BFD extension to measure data traffic
loss or delay on a link or tunnel and the scope is limited to BFD
frames.
Legacy BFD cannot detect any BFD frame loss if loss does not last for
dead interval. This draft proposes a method to detect a dropped frame on
the receiver. For example, if the receiver receives BFD CC frame k at
time t but receives frame k+3 at time t+10ms, and never receives frame
k+1 and/or k+2, then it has experienced a drop.
This proposal enables BFD engine to generate diagnostic information
on the health of each BFD session that could be used to preempt a
failure on a link that BFD was monitoring by allowing time for a
corrective action to be taken.
In a faulty datapath scenario, operator can use BFD health
information to trigger delay and loss measurement OAM protocol
(Connectivity Fault Management (CFM) or Loss Measurement (LM)-Delay
Measurement (DM)) to further isolate the issue.
The functionality proposed for BFD stability measurement is achieved
by appending the Null-Authentication TLV (as defined in Optimizing BFD
Authentication ) to the BFD control frame that do not have
authentication enabled.
This mechanism allows operator to measure the loss of BFD CC
frames.
When using MD5 or SHA authentication, BFD uses authentication TLV
that carries the Sequence Number. However, if non-meticulous
authentication is being used, or no authentication is in use, then the
non-authenticated BFD frames MUST include NULL-Auth TLV.
Loss measurement counts the number of BFD control frames missed at
the receiver during any Detection Time period. The loss is detected by
comparing the Sequence Number field in the Auth TLV (NULL or
otherwise) in successive BFD CC frames. The Sequence Number in each
successive control frame generated on a BFD session by the transmitter
is incremented by one.
The first BFD NULL-Auth TLV processed by the receiver that has a
non-zero sequence number is used for bootstrapping the logic. Each
successive frame after this is expected to have a Sequence Number that
is one greater than the Sequence Number in the previous frame. When
the Sequence Number wraps around it should start from 1 instead of
0.
Other than concerns raised in BFD there
are no new concerns with this proposal.
Authors would like to thank Nobo Akiya, Jeffery Haas, Peng Fan,
Dileep Singh, Basil Saji, Sagar Soni and Mallik Mudigonda who also
contributed to this document.