Network Working Group M. Smith
Internet-Draft R. Kandula
Intended status: Informational S. Appala
Expires: November 25, 2020 Cisco Systems
May 24, 2020

Scalable-Group Tag eXchange Protocol (SXP)
draft-smith-kandula-sxp-10

Abstract

This document discusses scalable-group tag exchange protocol (SXP), a control protocol to propagate IP address to Scalable Group Tag (SGT) binding information across network devices.

Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on November 25, 2020.

Copyright Notice

Copyright (c) 2020 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.

This document may not be modified, and derivative works of it may not be created, except to format it for publication as an RFC or to translate it into languages other than English.


Table of Contents

1. Introduction

SXP stands for the Scalable Group Tag (SGT) eXchange Protocol. Scalable Group Tags are assigned to endpoints connecting to the network that have common network policies. Each Scalable Group Tag is identified by a unique SGT value. The SGT to which an endpoint belongs can be assigned statically or dynamically, and the SGT can be used as a classifier in network policies. SXP is a control-plane mechanism used to transport an endpoint's SGT along with the IP address from one SGT-aware network device to another. The data that SXP transports is referred to as the IP-SGT binding in the rest of the document.

Conveying this information allows SGT-aware network devices to make policy decisions based on the role or intent of an endpoint, which is expressed as a Scalable Group and represented by an SGT.

Implementations currently use SGT classifiers to apply policy enforcement in a number of ways -

In addition to providing policy enforcement, implementations also currently use SGT functions to -

1.1. Terminology

This document frequently uses the following terms:

SGT Scalable Group Tag

SXP Scalable-Group Tag (SGT) eXchange Protocol

IP-SGT The IP Address to SGT binding that is exchanged over SXP connection

SXP Speaker The peer that sends the IP-SGT bindings over the SXP connection

SXP Listener The peer that receives the IP-SGT bindings over the SXP connection

Binding Database The database of IP-SGT bindings that the SXP Speaker uses to export to the peer

2. SXP Overview

SXP uses TCP as its transport protocol to set up SXP connection between 2 separate network devices. Each SXP connection has one peer designated as SXP speaker and the other peer as SXP listener. The peers can also be configured in a bi-directional (both) mode where each of them act as both speaker & listener. Connections can be initiated by either peer but binding information is always propagated from a speaker to a listener.

At a very high level, SXP comprises of the following -

- SXP connection establishment

- IP-SGT bindings propagation

A typical deployment scenario of SXP is illustrated below -

+---------------+                              +----------------+
|               |                              |                |
|               |                              |                |
|               |                              |                |
| SXP Speaker   |----------------------------->| SXP Listener   |    
|               |     IP-SGT Bindings          |                |
|               |                              |                |
|               |                              |                |
+---------------+                              +----------------+
       |                                                |
       |                                                |
       |                                                |
       |                                                |
       +------------------------------------------------+

                        SXP Connection

3. SXP Operation

The SXP connections are point-to-point and uses TCP as the underlying transport protocol. The summary of operations is illustrated in each of the sections below.

3.1. SXP Connection Management

An SXP connection consists of the following parts:

3.1.1. SXP Connection

3.1.2. SXP Message integrity/authenticity

3.1.2.1. SXP Connection Using MD5

The connection peers on the 2 network devices supply the same SXP password to the TCP layer which will authenticate all further messages using the MD5 algorithm (RFC 2385). The SXP payload is not encrypted because the main requirement is for the receiving device to determine that the message originated from a valid source and not to prevent snooping of message (discussed in security considerations section) payload. SXP uses the underlying TCP MD5 option for message integrity/authenticity. The TCP MD5 is exchanged (negotiated) during the initial TCP 3 way handshake. Both sides (peers) are pre-configured with the keys (password) which will be used for forming the MD5 digest. Each packet (segment) needs to be "signed" with the MD5 digest (16 bytes) formed using the password as the key.

Whenever the password needs to be changed for an existing TCP connection, the peer resets the existing TCP connection and sets up a new TCP connection with the new password. Changing the default password will cause all SXP connections using the default password to be re-established.

3.1.2.2. SXP Connection Using TCP-AO

The connection peers on the 2 network devices can authenticate each other using TCP-AO (RFC 5925). SXP uses the underlying TCP-AO option for message integrity/authenticity. The TCP-AO option is exchanged (negotiated) during the initial TCP 3 way handshake. Both sides (peers) are pre-configured with the keys which will be used for forming the HMAC. The TCP module will populate the TCP-AO option in each outgoing segment by generating the HMAC using the traffic keys. On the receiver, TCP will generate the HMAC using the traffic keys and incoming segment header to validate the HMAC in the incoming TCP-AO option.

Whenever the TCP-AO key-chain needs to be changed for an existing TCP connection, the peer resets the existing TCP connection and sets up a new TCP connection with the new key-chain. Changing the default key-chain will cause all SXP connections using the default key-chain to be re-established.

3.1.3. SXP Connectivity Discovery and Connection Recovery

SXP uses the TCP keep alive mechanism with the default behavior to maintain SXP connectivity. A SXP device attempts to maintain connectivity with each peer. In case of failures, the SXP device continues to retry connection setup with all the peers with which the SXP connections have not been established. This continues until either a connection is established or the peer is removed from the peer set by a change in configuration. This mechanism ensures automatic recovery of the SXP connection once the connectivity issue is resolved. The SXP connection can be re-established by either device and hence the connection can be brought up sooner without relying on the retry timers of the peer devices. Note that this is just an optimization and not a requirement for SXP connection recovery. If both ends of an SXP connection set up the TCP connection at the same time, the end with source IP address higher than the peer IP address wins: i.e. the TCP connection initiated from that end is kept and the other TCP connection is torn down.

3.1.4. SXP Connection Setup Sequence

When a SXP connection is configured, it initiates a TCP connection set-up request. SXP communication starts after TCP connection has been established between the two network devices. The end that initiates the TCP connection starts the exchange by sending the OPEN message to negotiate SXP version, SXP mode, and if that end is a listener, SXP capabilities attribute. Capabilities advertisement and Capabilities Attribute are described with additional details in section 5.4.1. The receiving device responds with an OPEN_RESP message that includes similar attributes. The connection can only be set-up with one end configured as a speaker and the other end configured as a listener. See the following sections regarding version negotiation and capability exchange process descriptions. Along with version negotiation, the hold time required for the connection is also negotiated. The following sections explain the hold time negotiation in detail.

After SXP connection is established, binding information can be sent from speaker to listener with UPDATE message. Whenever the SXP connection has to be closed the underlying TCP connection is closed.

3.1.5. SXP Connection States

An SXP connection maintains following states: OFF, PENDING_ON, DELETE_HOLD_DOWN, ON.

OFF: In this state, a connection initiation has not been started. This is the only state that an SXP connection will retry establishing the TCP connection.

PENDING_ON: In this state, an SXP OPEN message has been sent to the peer and a response from the peer SXP is expected.

DELETE_HOLD_DOWN: In this state, a connection that was previously in ON state has been terminated. Only a listener can be in this state.

ON: SXP is in ON state when SXP OPEN or SXP OPEN RESP message has been received. SXP connection has been setup successfully. An SXP connection will only propagates bindings in the ON state.

                         +-------+
               T7        |       |---------------+
           +------------>| OFF   |               |T1
           |             |       |<---------+    |
           |             +-------+   T2     |    | 
       +--------+          |   A           +-----v-+ 
       |DELETE  |          |   |           |PENDING | 
       |HOLDDOWN|        T5|   | T4        |  ON    | 
       +---A----+          |   |           +--------+ 
           |             +-v-----+              |
           |      T6     |  ON   |              |T3
           +-------------|       |<-------------+ 
                         +-------+                        

T1-T7: State Transitions

T1: The connection sends a TCP connect request and an SXP OPEN message to the peer. The connection then transits to the PENDING ON state.

T2: TCP connection setup failure;

T3: The connection received an SXP OPEN RESPONSE or SXP OPEN message. If the delete hold down timer is running, the timer is stopped and the reconciliation timer is started.

T4: TCP connection failure; SXP message process failure; SXP configuration change.

T5: The TCP connection is setup and an SXP OPEN message is received. If the delete hold down timer is running, the timer is stopped and the reconciliation timer is started.

T6: TCP connection failure when a connection is in listener mode. The delete hold down timer is started with this state transition.

T7: The delete hold down timer expired or TCP connection setup failure.

In failure scenarios, the connection transits to OFF state and an attempt is made to re-establish the connection.

3.2. Binding Database

A system should have a master binding database for reconciling binding information from multiple SXP connections. The database should consist of a single binding per IP address and there should be a mechanism to handle same bindings from multiple SXP connections.

3.2.1. SXP Learned IP-SGT Binding recovery

When an SXP connection goes down on a network device, SXP continues to learn/advertise IP-SGT bindings on other SXP connections that are alive. If the device has learnt any bindings from the peer where the SXP connection goes down, a delete hold down timer is started for that connection:

4. Message Formats

The SXP Messages and their formats are described in detail in this section.

4.1. Bit and Octet Numbering Convention

Throughout this section, the following numbering convention is used to depict the SXP message formats:

Each diagram depicts the format and size of each field in bits. Implementations MUST send the bits in each diagram as they are shown, traversing the diagram from top to bottom and then from left to right within each line (which represents a 32-bit quantity). Multi-byte fields representing numeric values must be sent in network (big endian) byte order. Descriptions of bit field (e.g., flag) values are described referring to the position of the bit within the field. These bit positions are numbered from the most significant bit through the least significant bit, so a 1-octet field with only bit 0 set has the value 0x80.

4.2. SXP Message Header

 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-++-+-+-+-+-+-+-+-+-+
 | | | | | | | | | | |1|1|1|1|1|1|1|1|1|1|2|2|2|2|2|2|2|2|2|2|3|3|
 |0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-++-+-+-+-+-+-+-+-+-+
 |                SXP Message Length (4 octets)                  |
 +---------------------------------------------------------------+
 |                SXP Message Type (4 octets)                    |
 +---------------------------------------------------------------+
 |                SXP payload (variable)                         |
 +---------------------------------------------------------------+ 
 

Field Length Description
SXP Message Length 4 This 4-octet unsigned integer indicates the total length of the message, including the header but excluding authentication tag. The maximum message length is 4096
SXP Message Type 4 This 4-octet unsigned integer indicates the type code of the SXP message. The following type codes are defined: 1 – OPEN 2 - OPEN_RESP 3 – UPDATE 4 – ERROR 5 – PURGE_ALL 6 - KEEPALIVE
SXP Payload Variable SXP message payload is an optional field. Its content when present depends on the message type

4.3. Attribute Formats

SXP payloads can contain a variable number of fields. Some fields are fixed and mandatory to support. Other fields are called attributes and they have an extended {Type, Length, Value} (TLV) format in order to allows for gradual introduction of a new feature without requiring a protocol version change. Each attribute is tagged and includes a length field. This allows for newer versions of SXP to add capabilities and co-exist with old versions of SXP in the same deployment. Each attribute includes a Flags field to control processing. The format of an attribute {flags, type, length, value} (TLV) is encoded as follows:

Compact TLV

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-++-+-+-+-+-+-+-+-+-+
   | | | | | | | | | | |1|1|1|1|1|1|1|1|1|1|2|2|2|2|2|2|2|2|2|2|3|3|
   |0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-++-+-+-+-+-+-+-+-+-+
   |O|N|P|C|E| | | |  Type         |  TLV Length   |               |                                                                      
   +-+-+-+-+-+-+-+-+---------------+---------------+               |
   |    Value (Variable: TLV Length octets long)                   |
   +---------------------------------------------------------------+

Compact Extended Length TLV

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-++-+-+-+-+-+-+-+-+-+
   | | | | | | | | | | |1|1|1|1|1|1|1|1|1|1|2|2|2|2|2|2|2|2|2|2|3|3|
   |0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-++-+-+-+-+-+-+-+-+-+
   |O|N|P|C|E| | | |  Type         |  TLV Extended Length          |                                                                        
   +-+-+-+-+-+-+-+-+---------------+-------------------------------+
   |    Value (Variable: TLV Extended Length octets long)          |
   +---------------------------------------------------------------+

Non-Compact TLV

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-++-+-+-+-+-+-+-+-+-+
   | | | | | | | | | | |1|1|1|1|1|1|1|1|1|1|2|2|2|2|2|2|2|2|2|2|3|3|
   |0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-++-+-+-+-+-+-+-+-+-+
   |O|N|P|C|                  Type                                 |
   | |T| |0|                                                       |                                                                        
   +-+-+-+-+-+-+-+-+-----------------------------------------------+
   |                  Non-Compact TLV Length                       |
   +---------------------------------------------------------------+
   |    Value (Variable: Non-Compact TLV Length octets long)       |
   +---------------------------------------------------------------+
Flags Bits Description
Optional (O) 0 1 - Attribute is optional and MAY be ignored 0 – Attribute is mandatory (well-known and MUST be recognized)
Non-Transitive (N) 1 0 – Attribute shall be forwarded even if it is not recognized 1 – Attribute shall not be forwarded
Partial (P) 2 1 – When a transitive attribute is forwarded but not recognized 0 – Otherwise
Compact (C) 3 1 – TLV is using compact Type and Length encoding 0 – TLV is encoded using all the 4 octets for Type and Length. All other flags are 0
Extended Length (E) 4 0 – Length is a single octet long 1 – Length is 2 octets long
Reserved 5-7 Reserved for future use. Must be transmitted as 0 and ignored on receive
Field Length Description
Flags 1 Attribute Flags
Type 1 or 4 TLV Type
Length 1, 2 or 4 Length (in octets) of the TLV Value field (i.e. not including the 3, 4, or 8-octets attribute header. Unsigned integer in the following range: [0..255] - For Compact non-Extended Length attribute. [256..4084] – For Compact Extended Length attribute.
Value Variable TLV values, which depend on the TLV type

If an optional non-transitive attribute is unrecognized, it is quietly ignored.

If an optional transitive attribute is unrecognized, the Partial bit (the third high-order bit) in the attribute flags octet is set to 1, and the attribute is retained for export according to the scope in which the attribute appears.

Attribute Length field of Compact attributes SHOULD be only as large as necessary to hold an unsigned integer that specifies the number of octets in the attribute Value field. This implies that Extended-Length field SHOULD contain a value greater than 256 (or the first octet of an Extended-Length field SHOULD be non-zero). The largest allowed value of an Extended-Length field is 4084, considering maximum SXP message length of 4096. The largest allowed value of a non-Compact attribute Length field is thus 4080. SXP Listener MUST accept and process Compact Extended-Length attributes with Length field in the range [0..255] without issuing any errors. In the rest of this document diagrams showing usage of Compact normal and/or Extended-Length attribute are only illustration of expected usage. They do not imply that only the illustrated class of attribute or attribute Length is supported.

Attribute Value field consists of well defined sub-fields which are defined in this protocol spec. Attribute Value field MAY contain additional, possibly optional, sub-attributes which are encoded like other attributes.

4.4. SXP OPEN and OPEN_RESP Message

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-++-+-+-+-+-+-+-+-+-+
   | | | | | | | | | | |1|1|1|1|1|1|1|1|1|1|2|2|2|2|2|2|2|2|2|2|3|3|
   |0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-++-+-+-+-+-+-+-+-+-+
   |                Version (4 octets)                             |
   +---------------------------------------------------------------+
   |                SXP Mode (4 octets)                            |
   +---------------------------------------------------------------+
   |                SXP Node-ID Attribute (7 octets)               |
   +---------------------------------------------------------------+
   |                Capabilities Attribute (variable)              |
   +---------------------------------------------------------------+
   |                Other Optional Attributes (variable)           |
   +---------------------------------------------------------------+

Field Length Description
Version 4 Unsigned integer value. The SXP Version is set to the highest version that is supported on the network device
SXP Mode 4 Unsigned integer value used to convey the SXP role that the device uses on the SXP connection. The following values are used: 1 – SXP Speaker 2 – SXP Listener
SXP Node-Id 7 REQUIRED non-transitive attribute (in SXP version 4 and above) an SXP Speaker SHALL include in OPEN or OPEN_RESP to convey its unique 32 bits Node ID. The attribute type used for Node ID Attribute is 5
Capabilities Variable REQUIRED non-transitive attribute (in SXP version 4 and above) an SXP Listener SHALL include in OPEN or OPEN_RESP to convey its capabilities for processing data on the connection

4.4.1. Node-ID

In the OPEN or OPEN_RESP message, Node-ID has to be included only by the speaker. The listener must not send the Node-ID in OPEN message and the speaker has to return an error if it receives a Node-ID from a listener.

SXP Node-ID with value 0 is reserved for denoting bindings that are received from connections that use older versions of the SXP protocol.

The Implementation Note Section has more details on how the Node-ID value is selected.

4.4.2. Capabilities Advertisement

SXP Listeners are REQUIRED to include the Capabilities attribute when they send an OPEN message or when they respond to such message with OPEN_RESP message. The Capabilities attribute describes the SXP features supported by the Listener. This allows the SXP Speaker to tailor the behavior of the connection to what the Listener could process. An SXP speaker receiving Capabilities attribute from an SXP Listener SHALL NOT send to that listener in UPDATE messages any mandatory attributes that were not indicated as supported in the Capabilities attribute. However, the Listener must be able to process the optional attributes to act as a relay agent. If an SXP speaker receives from its listener peer a capability that it does not itself support or recognize, it MUST ignore that capability. In particular, ERROR message MUST NOT be generated and the SXP session MUST NOT be terminated in response to reception of a capability that is not supported by the local SXP speaker.

4.4.2.1. Capabilities Attribute

This is a REQUIRED attribute that is used by an SXP Listener to convey to its peer SXP Speaker the list of capabilities supported by the listener. The attribute contains one or more triples <Capability Code, Capability Length, Capability Values>:

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-++-+-+-+-+-+-+-+-+-+
   | | | | | | | | | | |1|1|1|1|1|1|1|1|1|1|2|2|2|2|2|2|2|2|2|2|3|3|
   |0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-++-+-+-+-+-+-+-+-+-+
   |O|N|P|C|E| | | | Capabilities  |TLV Length (C=1|               |
   | | | | | | | | | Type 6        |    EL=0)      |               |
   |0|1|0|1| | | | |               +---------------+---------------+
   | | | | | | | | |               |TLV Extended Length (C=1 EL=1) |                                                          
   +-+-+-+-+-+-+-+-+---------------+-------------------------------+
   |Cap Code 1     |Cap Length 1   |                       
   |---------------+---------------+-------------------------------+
   |             Capability 1 Value (Variable)                     |
   +---------------------------------------------------------------+
   x                      *****                                    x
   x---------------------------------------------------------------x
   |Cap Code N     |Cap Length N   |                       
   |---------------+---------------+-------------------------------+
   |             Capability N Value (Variable)                     |
   +---------------------------------------------------------------+
   

Field Length Description
Capability Code 1 Unsigned integer that unambiguously identifies individual capabilities
Capability Length 1 Unsigned integer that contains the length of the Capability Value field in octets
Capability Value Variable Variable-length field that is interpreted according to the value of the Capability Code field. Could have zero length

The encoding and meaning of the capabilities defines by this version of the SXP protocol spec are as follows:

Capability Name Capability Code Capability Length Description
IPv4 Unicast 1 0 Listener is capable of receiving and handling IPv4 unicast bindings
IPv6 Unicast 2 0 Listener is capable of receiving and handling IPv6 unicast bindings
Subnet Bindings 3 0 Listener is capable of receiving and handling subnet bindings

4.4.2.2. Handling of Capabilities Attribute

Subnet Bindings – This capability indicates to the speaker that subnet bindings SHOULD be exported as regular bindings without any expansion. A listener that fully supports bindings associated with IP prefixes other than host addresses (IPv4 /32 or IPv6/128) MUST include the Subnet Bindings capability.

A listener that does not support subnet bindings in its forwarding path but has the capability to expand subnet bindings locally MUST send Subnet Bindings capability in order to prevent the listener from unnecessarily expanding subnet bindings. A listener which does not support subnet bindings but might be able to support the volume of expanded bindings MAY include Subnet Bindings capabilities.

A speaker receiving such capability MAY expand subnet bindings exhaustively or selectively and export individual bindings within a subnet. The expansion of subnet bindings COULD be limited by implementation to a reasonable subnet size with a limit on the total number of bindings expanded. Therefore there could be subnet bindings which are not expanded.

When a speaker does not expand a subnet binding it SHOULD export it unmodified as a subnet binding. This will allow the listener to propagate such bindings even when it cannot use it locally.

When a speaker expands a subnet binding it MUST NOT export it as both subnet and expanded bindings since the listener or any further SXP peers along the SXP propagation path cannot distinguish expanded bindings from other host bindings and relate them to the subnet bindings they are originate from.

IPv6 Unicast – This capability indicates to the speaker that the listener is capable of handling and using bindings associated with IPv6 unicast addresses. A listener that support IPv6 bindings MUST include this capability. When this capability is not included, the speaker MUST NOT send IPv6 bindings as mandatory attributes. A speaker, however, SHOULD send IPv6 bindings within Optional Transitive attributes when the IPv6 Unicast capability is omitted. This will allow an SXP node to relay IPv6 bindings even when it could not process them locally.

IPv4 Unicast – This capability should be handled in the same way as IPv6 Unicast capability is handled. It is unlikely for any SXP listener to omit IPv4 Unicast capability at least until pure IPv6 networks would become common.

4.4.3. Keep-alive and Hold Time Negotiation

SXP uses TCP-based, keep-alive mechanism to determine if a connection is live. This mechanism has been used in all prior versions of the protocol. SXP version 4 adds a discretionary negotiated keep-alive mechanism within the protocol itself in order to provide more predictable and timely detection of connection loss.

SXP connections are asymmetric with almost all of the protocol messages (except for OPEN/OPEN_RESP and ERROR) being sent from an SXP speaker to an SXP listener.

SXP listener could also keep potentially large volume of state per connection which includes all the binding information learnt on a connection. Therefore, it is only meaningful to have a keep-alive mechanism that allows a listener to detect the loss of connection with a speaker.

The mechanism is based on two timers:

Hold Timer – Used by a listener for detection of elapsing time without successive KEEPALIVE and/or UPDATE messages from a speaker.

Keep-alive Timer – Used by a speaker to trigger sending of KEEPALIVE messages during intervals when no other information is exported via UPDATE messages.

The keep-alive timer is one-thirds of the negotiated hold-timer value.

Hold Timer for the keep-alive mechanism MAY be negotiated during the OPEN/OPEN_RESP exchange at connection setup.

The diagram below illustrates the Hold Time negotiation process.

                     +------+                              +------+
                     | Min  |                              | Max  |
                     |      |                              |      |
                     +------+                              +------+
  Listener               |                                    |
                         |                                    |
        -----------------V<---------------------------------->V--------
  Speaker |                               |                        |
          |                               |                        |
         1|                              2|                        |3
        +-v---------+                  +--v--------+      +--------v---+
        | Selected  |                  | Selected  |      |Unacceptable|
        |  Hold     |                  |  Hold     |      | Hold       |
        |  Time     |                  |  Time     |      | Time       |
        |           |                  |           |      |            |
        +-----------+                  +-----------+      +------------+

Connection initiated by Listener

Connection initiated by Speaker

4.4.3.1. Hold-Time Attribute

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-++-+-+-+-+-+-+-+-+-+
   | | | | | | | | | | |1|1|1|1|1|1|1|1|1|1|2|2|2|2|2|2|2|2|2|2|3|3|
   |0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-++-+-+-+-+-+-+-+-+-+
   |O|N|P|C|E| | | |Hold-Time (7)  | TLV Length 2/4|       
   +-+-+-+-+-+-+-+-+---------------+-------------------------------+
   | Hold Time Minimum Value       |  Hold Time Maximum Value      |
   +---------------------------------------------------------------+

Field Length Description
Optional(O) 1 Bit Value is 0
Non-Transitive(N) 1 Bit Value is 0
Partial (P) 1 Bit Value is 0
Compact (C) 1 Bit Value is 1
Extended Length (E) 1 Bit Value is 0
Attribute Type 1 Hold-Time (7)
Attribute Value Length 1 2 - when only Hold Time Minimum Value is present. 4 – when both Hold Time Minimum and Maximum values are present
Hold Time Minimum Value 2 Unsigned integer indicating the number of seconds the sender proposes as a lower bound for the Hold Timer period or the Hold Time period selected by a responder. 0 or at least 3 seconds
Hold Time Maximum Value 2 Unsigned integer indicating the number of seconds the sender proposes as a upper bound for the Hold Timer period. MUST be greater than Hold Time Minimum Value

Hold-Time is a well-known discretionary attribute an initiator of a connection MAY include in an OPEN message and a responder MAY be REQUIRED to include in an OPEN_RESP in order to negotiate the timer periods used for keep-alive mechanism. The Hold-Time attribute has to be marked as Non-Transitive and an error has to be returned if it's not marked as Non-Transitive.

4.5. SXP UPDATE Message

Prior versions of SXP protocol specify that an UPDATE message contains one or more SXP mappings. This version of SXP protocol generalizes the UPDATE message. An UPDATE message MAY contain one or more attributes. Some attributes are well-known and additional attributes COULD be added in the future without affecting existing implementations of SXP version 4. This document defines the format of the attributes used within UPDATE message and how they are combined to carry the SXP binding information.

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-++-+-+-+-+-+-+-+-+-+
   | | | | | | | | | | |1|1|1|1|1|1|1|1|1|1|2|2|2|2|2|2|2|2|2|2|3|3|
   |0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|
   +---------------------------------------------------------------+
   |                  Attribute 1 (Variable)                       |
   +---------------------------------------------------------------+
   x                    *****                                      x
   +---------------------------------------------------------------+
   |                  Attribute N (Variable)                       |
   +---------------------------------------------------------------+

4.5.1. UPDATE Attributes

4.5.1.1. IPv4-Delete-Prefix Attribute

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-++-+-+-+-+-+-+-+-+-+
   | | | | | | | | | | |1|1|1|1|1|1|1|1|1|1|2|2|2|2|2|2|2|2|2|2|3|3|
   |0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-++-+-+-+-+-+-+-+-+-+
   |O|N|P|C|E| | | |IPv4-Delete-   | TLV Length    |Reserved       |
   | | | | | | | | | Prefix        |  C = 1 EL = 0 |               |
   |0|0|0|1| | | | | Type = 13     +-------------------------------+
   | | | | | | | | |               |TLV Extended Length (C=1 EL=1) |         
   +-+-+-+-+-+-+-+-+---------------+-------------------------------+
   |                 IPv4 Prefix 1 (Variable)                      |
   +---------------------------------------------------------------+
   x                    *****                                      x
   +---------------------------------------------------------------+
   |                 IPv4 N Value (Variable)                       |
   +---------------------------------------------------------------+

This attribute is used to convey the withdrawal of one or more IPv4 bindings which are unambiguously identified by their IPv4 prefix. This attribute contains a set of IPv4 prefixes. Each IPv4 prefix is encoded as a 2-tuple of the form <length, prefix>, whose fields are encoded as:

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-++-+-+-+-+-+-+-+-+-+
   | | | | | | | | | | |1|1|1|1|1|1|1|1|1|1|2|2|2|2|2|2|2|2|2|2|3|3|
   |0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-++-+-+-+-+-+-+-+-+-+
   |  Length =     |                Reserved                       |
   |  [0..32]      |                                               |
   +---------------+-----------------------------------------------+
   |                 Prefix                                        |                                                                 
   +---------------------------------------------------------------+
   | 0 < Length    | 9 <= Length   | 17 <= Length  |25 <= Length   |
   +---------------+---------------+---------------+---------------+
   

Field Length (in Octets) Description
Length 1 Unsigned integer. The length in bits of the IP address prefix. A length of zero indicates a prefix that matches all IP addresses (with Prefix field filled with zero in all octets)
Prefix Variable (Length/8) An IP address prefix, followed by the minimum number of trailing bits needed to make the end of the field fall on an octet boundary. The value of trailing bits is irrelevant

4.5.1.2. IPv6-Delete-Prefix Attribute

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-++-+-+-+-+-+-+-+-+-+
   | | | | | | | | | | |1|1|1|1|1|1|1|1|1|1|2|2|2|2|2|2|2|2|2|2|3|3|
   |0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-++-+-+-+-+-+-+-+-+-+
   |O|N|P|C|E| | | |IPv4-Delete-   | TLV Length    |Reserved       |
   | | | | | | | | | Prefix        |  C = 1 EL = 0 |               |
   |0|0|0|1| | | | | Type = 14     +-------------------------------+
   | | | | | | | | |               |TLV Extended Length (C=1 EL=1) | 
   +-+-+-+-+-+-+-+-+---------------+-------------------------------+
   |                 IPv6 Prefix 1 (Variable)                      |
   +---------------------------------------------------------------+
   x                    *****                                      x
   +---------------------------------------------------------------+
   |                 IPv6 N Value (Variable)                       |
   +---------------------------------------------------------------+

This attribute is used to convey the withdrawal of one or more IPv6 bindings which are unambiguously identified by their IPv6 prefix. This attribute contains a set of IPv6 prefixes. Each IPv6 prefix is encoded as a 2-tuple of the form <length, prefix>, whose fields are encoded as:

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-++-+-+-+-+-+-+-+-+-+
   | | | | | | | | | | |1|1|1|1|1|1|1|1|1|1|2|2|2|2|2|2|2|2|2|2|3|3|
   |0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-++-+-+-+-+-+-+-+-+-+
   |  Length =     |                  Reserved                     |
   |  [0..128]     |                                               |
   +---------------+-----------------------------------------------+
   |                 Prefix[1..4]                                  |                                                                 
   +---------------------------------------------------------------+
   | 0 < Length    | 9 <= Length   | 17 <= Length  |25 <= Length   |
   +---------------+---------------+---------------+---------------+
   |                 Prefix[5..8]                                  |                                                                 
   +---------------------------------------------------------------+
   | 33 < Length   | 41 <= Length  | 49 <= Length  |57 <= Length   |
   |               |               |               |               |
   +---------------+---------------+---------------+---------------+
   |                 Prefix[9..12]                                 |                                                                 
   +---------------------------------------------------------------+
   | 65 < Length   | 73 <= Length  | 81 <= Length  |89 <= Length   |
   +---------------+---------------+---------------+---------------+
   |                 Prefix[13..16]                                |                                                                 
   +---------------------------------------------------------------+
   | 97 < Length   | 105 <= Length | 113 <= Length |121 <= Length  |
   +---------------+---------------+---------------+---------------+
  

Field Length (in Octets) Description
Length 1 Unsigned integer. The length in bits of the IP address prefix. A length of zero indicates a prefix that matches all IP addresses (with Prefix field filled with zero in all octets)
Prefix Variable (Length/8) An IP address prefix, followed by the minimum number of trailing bits needed to make the end of the field fall on an octet boundary. The value of trailing bits is irrelevant

4.5.1.3. Peer-Sequence Attribute

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-++-+-+-+-+-+-+-+-+-+
   | | | | | | | | | | |1|1|1|1|1|1|1|1|1|1|2|2|2|2|2|2|2|2|2|2|3|3|
   |0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-++-+-+-+-+-+-+-+-+-+
   |O|N|P|C|E| | | |Peer-Sequence  | TLV Length    |      
   |0|0|0|1|0| | | | Type = 16     |    4 * N      |               
   +-+-+-+-+-+-+-+-+---------------+-------------------------------+
   |                         SXP ID 1                              |
   +---------------------------------------------------------------+
   |                         SXP ID 2                              |
   +---------------------------------------------------------------+
   x                         *****                                 x
   +---------------------------------------------------------------+
   |                         SXP ID N                              |
   +---------------------------------------------------------------+
  

This attribute contains the SXP nodes through which exported bindings has traversed. Each SXP node is identified by a 4 octet SXP ID. SXP ID MUST be unique for every SXP node in an SXP deployment. As exported bindings are propagated by an SXP Speaker, the speaker SHALL prepend its own SXP ID to the sequence of SXP IDs of the Peer-Sequence attribute that was received by an SXP Listener on the same node. If the exported binding is originated by the local node, the Peer-Sequence will contain a single SXP ID identifying the local node. When an UPDATE message is received by an SXP Listener, the first SXP ID within each Peer-Sequence attribute is the SXP node ID of the SXP Speaker on the remote end of the connection the message is received on. The last SXP ID is the node ID of the SXP Speaker that originates the bindings associated with a Peer-Sequence attribute. UPDATE message SHALL contain one Peer-Sequence attribute and MAY contain multiple Peer-Sequence attributes. Each Peer-Sequence attribute is associated with the bindings that follow it up to the next Peer-Sequence attribute if more than one Peer-Sequence attribute is present or up to the end of the UPDATE message.

4.5.1.4. Source-Group-Tag Attribute

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-++-+-+-+-+-+-+-+-+-+
   | | | | | | | | | | |1|1|1|1|1|1|1|1|1|1|2|2|2|2|2|2|2|2|2|2|3|3|
   |0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-++-+-+-+-+-+-+-+-+-+
   |O|N|P|C|E| | | |SGT Type = 17  | TLV Length 2  |    Reserved   |
   |0|0|0|1|0| | | |  Group Tag    |     2         |               |
   +-+-+-+-+-+-+-+-+---------------+---------------+---------------+
   |       SGT Value               |            Reserved           |
   +-------------------------------+-------------------------------+
 

Field Length (in Octets) Description
Attribute Type 1 Scalable-Group-Tag (17)
Attribute Value Length 1
SGT Value 2 Unsigned integer value of an SGT

4.5.1.5. IPv4-Add-Prefix Attribute

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-++-+-+-+-+-+-+-+-+-+
   | | | | | | | | | | |1|1|1|1|1|1|1|1|1|1|2|2|2|2|2|2|2|2|2|2|3|3|
   |0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-++-+-+-+-+-+-+-+-+-+
   | | | | | | | | |IPv4-Add-Prefix|TLV Length (C=1|     Reserved  |
   |O|N|P|C|E| | | |  Type = 11    |    EL=0)      |               |
   | | | | | | | | |               +---------------+---------------+
   |0|0|0|1| | | | |               |TLV Extended Length (C=1 EL=1) |                                                          
   +-+-+-+-+-+-+-+-+---------------+-------------------------------+
   |             IPv4 Prefix 1 (Variable)                          |
   +---------------------------------------------------------------+
   x                      *****                                    x
   |---------------------------------------------------------------+
   |             IPv4 N Value (Variable)                           |
   +---------------------------------------------------------------+

This attribute is used to convey the binding of one or more IPv4 prefixes with an SGT value or other information. The encoding of each IPv4 prefix is described above in IPv4-Delete-Prefix Attribute section.

The unbundling of IP prefixes from the information associated with them achieves two major objectives:

The associated SGT is the value specified by the latest preceding occurrence of Scalable-Group-Tag attribute within the same UPDATE message. An IPv4-Add-Prefix attribute SHALL be preceded with an occurrence of Peer-Sequence attribute. The latest preceding occurrence provides the path along which the bindings of all the prefixes contained in the IPv4-Add-Prefix have been traversed.

4.5.1.6. IPv6-Add-Prefix Attribute

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-++-+-+-+-+-+-+-+-+-+
   | | | | | | | | | | |1|1|1|1|1|1|1|1|1|1|2|2|2|2|2|2|2|2|2|2|3|3|
   |0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-++-+-+-+-+-+-+-+-+-+
   | | | | | | | | |IPv6-Add-Prefix|TLV Length (C=1|     Reserved  |
   |O|N|P|C|E| | | |  Type = 12    |    EL=0)      |               |
   | | | | | | | | |               +---------------+---------------+
   |0|0|0|1| | | | |               |TLV Extended Length (C=1 EL=1) |                                                          
   +-+-+-+-+-+-+-+-+---------------+-------------------------------+
   |             IPv6 Prefix 1 (Variable)                          |
   +---------------------------------------------------------------+
   x                      *****                                    x
   |---------------------------------------------------------------+
   |             IPv6 N Value (Variable)                           |
   +---------------------------------------------------------------+

This attribute is used to convey the binding of one or more IPv6 prefixes with an SGT value or other information. The encoding of each IPv6 prefix is described above in IPv6-Delete-Prefix Attribute section. The handling of this attribute and its relationship with preceding attributes such as Peer-Sequence or Scalable-Group-Tag are the same as what is described above in IPv4-Add-Prefix section except for a different attribute type (12) and possibly longer IPv6 prefixes.

4.5.1.7. IPv4-Add-Table Attribute

This attribute provides a flexible tabular representation of bindings information. It is provided for efficient aggregation of multiple bindings information when an implementation cannot aggregate multiple IP prefixes that are associated with the same SGT or other common attributes.

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-++-+-+-+-+-+-+-+-+-+
   | | | | | | | | | | |1|1|1|1|1|1|1|1|1|1|2|2|2|2|2|2|2|2|2|2|3|3|
   |0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-++-+-+-+-+-+-+-+-+-+
   |O|N|P|C|E| | | | IPv4-Add-Table| TLV Length    |  Reserved     |
   |0|0|0|1| | | | |  Type = 21    | (C = 1 EL = 0)|               |
   | | | | | | | | |               |-------------------------------+
   | | | | | | | | |               | TLV Extended Length (C=1 EL=1)|  
   +-+-+-+-+-+-+-+-+---------------+---------------+---------------+
   | Number of     |                 Reserved                      |
   | Columns C     |                                               |
   +---------------+---------------+---------------+---------------+
   |'1'Column Type | '1'Column Type| '1'Column     |'2'Column Type |
   |   Flags       |               |   Width       |   Flags       |
   +---------------+---------------+---------------+---------------+
   |'2'Column Type | '2'Column     |                               |
   |               |    Width      |           ***                 |
   +---------------+---------------+---------------+---------------+
   x                           ***                                 x
   +---------------+---------------+---------------+---------------+
   |'C'Column Type | 'C'Column Type| 'C'Column     |   Reserved    |
   |   Flags       |               |   Width       |               |
   +---------------+---------------+------------------------------+--+
   | '1''1' Value  |     ***       |'1''C'Value    |'1'IPv4 Prefix   |
   |               |     ***       |               |    (Variable)   |
   +---------------+---------------+---------------+-----------------+
   x         ***   x     ***       x    ***        x         ***     x
   x         ***   x     ***       x    ***        x         ***     x
   x---------------+---------------+---------------+-----------------x
   | 'R''1' Value  |     ***       |'R''C'Value    |'R'IPv4 Prefix   |
   |               |     ***       |               | (Variable)      |
   +---------------+---------------+---------------+-----------------+                                                                       

 

Field Bits Description
Number of Columns 1 1 - Unsigned integer which specifies the number of columns of information (not including the trailing IP prefix) associated with each prefix
[i] Column Type Flags 1 0 – An Attribute Type Flags value defining the flag of the attribute in the I’th column on each row
[i] Column Type 1 1 – An Attribute Type value defining what each row have in the i’th column position
[i] Column Width 1 1 – The length in octets of the attribute value in the i’th column on each row
[j] IPv4 Prefix Variable 0 – An IP prefix which is associated with the C-tuple of attribute values on the j’th row of the table. The format of is the same as the IP prefix field in the IPv4-Add-Prefix attribute

Below is an example of an IPv4-Add-Table attribute which contain a single SGT attribute value on each row.

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-++-+-+-+-+-+-+-+-+-+
   | | | | | | | | | | |1|1|1|1|1|1|1|1|1|1|2|2|2|2|2|2|2|2|2|2|3|3|
   |0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-++-+-+-+-+-+-+-+-+-+
   |O|N|P|C|E| | | | IPv4-Add-     | TLV Length    |   Reserved    |
   | | | | | | | | |  Table        | (C = 1 EL = 0)|               |
   |0|0|0|1| | | | |  Type = 21    |-------------------------------+
   | | | | | | | | |               | TLV Extended Length (C=1 EL=1)|  
   +-+-+-+-+-+-+-+-+---------------+---------------+---------------+
   | Number of     |                Reserved                       |
   | Columns 1     |                                               |
   +-------------------------------+-------------------------------+
   |Scalable-Group | Column        |           Reserved            |
   | Tag Type = 17 | Width = 2     |                               |
   +-------------------------------+-------------------------------+--+
   |     '1'SGT Value              |'1'Prefix      |'1'IPv4 Prefix    |
   |                               | Length        |                  |
   +-------------------------------+---------------+------------------+
   x         ***                   x    ***        x         ***      x
   x-------------------------------+---------------+------------------x
   |     'R'SGT Value              |'R'Prefix      |'R'IPv4 Prefix    |
   |                               | Length        |                  |
   +-------------------------------+---------------+------------------+                                                                              

 

4.5.1.8. IPv6-Add-Table Attribute

This attribute provides a flexible tabular representation of IPv6 bindings information.

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-++-+-+-+-+-+-+-+-+-+
   | | | | | | | | | | |1|1|1|1|1|1|1|1|1|1|2|2|2|2|2|2|2|2|2|2|3|3|
   |0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-++-+-+-+-+-+-+-+-+-+
   |O|N|P|C|E| | | | IPv6-Add-     | TLV Length    |    Reserved   |
   | | | | | | | | |  Table        | (C = 1 EL = 0)|               |
   |0|0|0|1| | | | |  Type = 22    |-------------------------------+
   | | | | | | | | |               | TLV Extended Length (C=1 EL=1)|  
   +-+-+-+-+-+-+-+-+---------------+-------------------------------+
   | Number of     |                Reserved                       |
   | Columns C     |                                               |
   +---------------+---------------+---------------+---------------+
   |'1'Column Type | '1'Column Type| '1'Column     |'2'Column Type |
   |   Flags       |               |   Width       |   Flags       |
   +---------------+---------------+---------------+---------------+
   |'2'Column Type | '2'Column     |                               |
   |               |    Width      |           ***                 |
   +---------------+---------------+---------------+---------------+
   x                           ***                                 x
   +---------------+---------------+---------------+---------------+
   |'C'Column Type | 'C'Column Type| 'C'Column     |    Reserved   |
   |   Flags       |               |   Width       |               |
   +---------------+---------------+-------------------------------+--+
   | '1''1' Value  |     ***       |'1''C'Value    |'1'IPv6 Prefix    |
   |               |     ***       |               |  (Variable)      |
   +---------------+---------------+---------------+------------------+
   x         ***   x     ***       x    ***        x         ***      x
   x---------------+---------------+---------------+------------------x
   | 'R''1' Value  |     ***       |'R''C'Value    |'R'IPv6 Prefix    |
   |               |     ***       |               |  (Variable)      |
   +---------------+---------------+---------------+------------------+                                                                       

 

The format and handling of IPv6-Add-Table attribute are the same as what is described above in IPv4-Add-Table section except that the attribute type (22) is used and the trailing IPv6 Prefix field in each row could be potentially longer than the corresponding trailing IPv4 Prefix fields.

4.5.2. UPDATE Message Samples

Single IPv4 host binding as exported by the first Speaker away from the originating node:

 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-++-+-+-+-+-+-+-+-+-+
 | | | | | | | | | | |1|1|1|1|1|1|1|1|1|1|2|2|2|2|2|2|2|2|2|2|3|3|
 |0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-++-+-+-+-+-+-+-+-+-+------
 |              SXP Message Length = 32                          |SXP
 +---------------------------------------------------------------+Msg
 |              SXP Message Type = UPDATE (3)                    |Header
 +---------------------------------------------------------------+------
 |O|N|P|C|E| | | |Peer-Sequence  | TLV Length    |   Reserved    |
 |0|0|0|1|0| | | | Type = 16     |    8          |               | 
 +-+-+-+-+-+-+-+-+---------------+-------------------------------+Peer-
 |                  Local SXP ID                                 |Seq-
 |                                                               |Attr
 +---------------------------------------------------------------+
 |SXP ID of Speaker end of the connection on which it was        |
 |             originally received                               |
 +---------------------------------------------------------------+-----
 |O|N|P|C|E| | | |Scalable-Group-| TLV Length    |    Reserved   |
 | | | | | | | | | Tag           |    2          |               |Src-
 |0|0|0|1|0| | | | Type = 17     |               |               |Grp-
 +-+-+-+-+-+-+-+-+---------------+-------------------------------+Tag
 |    SGT Value = <sgt#>         |             Reserved          |Attr
 |                               |                               |
 +---------------------------------------------------------------+-----
 |O|N|P|C|E| | | |IPv4-Add-Prefix| TLV Length    |Prefix Length  |IPv4-
 | | | | | | | | | Type = 11     |    5          |     32        |Add-
 |0|0|0|1|0| | | |               |               |               |Pre-
 +-+-+-+-+-+-+-+-+---------------+-------------------------------+fix
 |                  IPv4 Host Address                            |Attr
 +---------------------------------------------------------------+-----
 

Multiple IPv4 host and subnet bindings sharing the same Peer-Sequence as exported by the first Speaker away from the originating node. 11 subnets bindings of length between 17 and 24 and as many host bindings as SXP total message size allowed are packed into this message:

 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-++-+-+-+-+-+-+-+-+-+
 | | | | | | | | | | |1|1|1|1|1|1|1|1|1|1|2|2|2|2|2|2|2|2|2|2|3|3|
 |0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-++-+-+-+-+-+-+-+-+-+------
 |              SXP Message Length = 4096                        |SXP 
 +---------------------------------------------------------------+Msg
 |              SXP Message Type = UPDATE (3)                    |Header
 +---------------------------------------------------------------+------
 |O|N|P|C|E| | | |Peer-Sequence  | TLV Length    |   Reserved    |
 |0|0|0|1|0| | | | Type = 16     |    8          |               |
 +-+-+-+-+-+-+-+-+---------------+-------------------------------+Peer-
 |                  Local SXP ID                                 |Seq-
 |                                                               |Attr
 +---------------------------------------------------------------+
 |SXP ID of Speaker end of the connection on which it was        |
 |             originally received                               |
 +---------------------------------------------------------------+-----
 |O|N|P|C|E| | | |IPv4-Add-Table | TLV Length    |    Reserved   |IPv4-
 | | | | | | | | |               |3 + 11x6 +572x7|               |Add-
 |0|0|0|1|0| | | | Type = 21     | = 4073        |               |Table
 +-+-+-+-+-+-+-+-+---------------+-------------------------------+Attri-
 |Number of      |               Reserved                        |-bute
 |Columns 1      |                                               |
 +---------------------------------------------------------------+------
 |Scalable       | Column        |         Reserved              |Col
 |Group Tag      |  Width = 2    |                               |Head
 |Type = 17      |               |                               |ers
 +-+-+-+-+-+-+-+-+---------------+-------------------------------+------
 | '1' SGT Value                 |'1'Prefix      |   Reserved    |
 |                               |Length  =      |               |
 |                               | [17..24]      |               |
 |-------------------------------+---------------+---------------+
 |        '1' IPv4 Subnet Prefix                 |   Reserved    |Subnet
 +-----------------------------------------------+---------------+Bind-
 *                                               *               |ings
 *                                               *   Reserved    |Rows
 * '2..10' <SGT Value, IPv4 Subnet Prefix> rows  *               |
 +-----------------------------------------------+---------------+
 | '11' SGT Value                |'11'Prefix     |    Reserved   |
 |                               |Length  =      |               |
 |                               | '17..24'      |               |
 |-------------------------------+---------------+---------------+
 |        '11' IPv4 Subnet Prefix                |   Reserved    |
 +-----------------------------------------------+---------------+----
 | '1' SGT Value                 |'1'Prefix      |  Reserved     |
 |                               |Length  = 32   |               |
 |-------------------------------+---------------+---------------+
 |                  '1'IPv4 Host Address                         |
 +-----------------------------------------------+---------------+
 x                                               |   Reserved    |Host
 x '2..571' <SGT Value, 32, IPv4 Host Address>   |               |Bind-
 x                                               +---------------+ing
 x                                                               |Rows
 x---------------------------------------------------------------+
 | '572' SGT Value               |'572'Prefix    |   Reserved    |
 |                               |Length  = 32   |               |
 +-------------------------------+---------------+---------------+
 |                  '572'IPv4 Host Address                       |
 +---------------------------------------------------------------+-----
   
 

4.6. SXP ERROR Message

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-++-+-+-+-+-+-+-+-+-+
   | | | | | | | | | | |1|1|1|1|1|1|1|1|1|1|2|2|2|2|2|2|2|2|2|2|3|3|
   |0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-++-+-+-+-+-+-+-+-+-+
   | |   0 (E=0)   |  0 (E=0)      |  non-Extended Error Code (E=0)|
   |E+-------------+---------------+-------------------------------|
   | |Error Code   |Error Sub-code |                               |
   | |  (E=1)      |  (E=1)        |                               | 
   +-+-----------------------------+                               |
   |             Data (Variable)                                   |
   +---------------------------------------------------------------+

Field Bits Length (in Octets) Description
Extended 0 1 – Extended Error format, which includes code, sub-code, and variable data 0 – Legacy non-Extended Error
Error Code [1..7] 0 Unsigned integer in the range [0..127] – indicates the type of ERROR
Error Sub-Code 1 Unsigned integer in the range [0..255] – Provides extended information about the nature of the reported error. Each Error code may have one or more Error Sub-codes associated with it. If no appropriate Error Sub-code is defined, then zero (Unspecified) value is used
non-Extended Error Code 2 Error code values: 0 – No error 1 - Version Mismatch 2 - Message Parse Error
Data Variable Message Length - 10 Additional data for diagnosing the reason for the ERROR message. The content depends on the values of Error Code and Error Sub-code

4.6.1. Error Codes

    +-----------+-----------+-------------------------------------+
    | Error     | Error     |          Symbolic Name              |
    |  Code     | Sub-code  |                                     |
    +-----------+-----------+-------------------------------------+ 
    |    1      |    -      |     Message Header Error            |
    |           |           |                                     |
    +-----------+-----------+-------------------------------------+
    |    2      |    -      |    OPEN Message Error               |
    |           |           |                                     |
    +-----------+-----------+-------------------------------------+  
    |    3      |    -      |     UPDATE Message Error            |
    |           |           |                                     |
    +-----------+-----------+-------------------------------------+
    |           |    1      |  Malformed Attribute List           |
    |           |           |                                     |
    +           +-----------+-------------------------------------+
    |           |    2      |  Unexpected Attribute               |
    |           |           |                                     |
    +           +-----------+-------------------------------------+
    |           |    3      |  Missing Well-known Attribute       |
    |           |           |                                     |
    +           +-----------+-------------------------------------+
    |           |    4      |  Attribute Flags Error              |
    |           |           |                                     |
    +           +-----------+-------------------------------------+
    |           |    5      |  Attribute Length Error             |
    |           |           |                                     |
    +           +-----------+-------------------------------------+
    |           |    6      |  Malformed Attribute                |
    |           |           |                                     |
    +           +-----------+-------------------------------------+
    |           |    7      |  Optional Attribute Error           |
    |           |           |                                     |
    +           +-----------+-------------------------------------+
    |           |    8      |  Unsupported Version Number         |
    |           |           |                                     |
    +           +-----------+-------------------------------------+
    |           |    9      |  Unsupported Optional Attribute     |
    |           |           |                                     |
    +           +-----------+-------------------------------------+
    |           |    10     |  Unacceptable Hold Time             |
    |           |           |                                     |
    +-----------+-----------+-------------------------------------+
   

The error codes and the error sub-codes are used to interpret the type of error and the specific attributes that have caused the error need not be sent back with the error message.

4.7. SXP PURGE-ALL Message

There is no payload corresponding to this message type.

When an SXP connection on which the local SXP node is a speaker is administratively deleted, SXP MUST send a PURGE-ALL message to the SXP listener at the remote end. This will provide the listener timely knowledge that the connection is about to be torn down and that it is not an intermittent loss of communication. The listener will immediately delete all bindings that were received on that connection and will not go through Delete Hold-Down timer before removal of bindings.

When SXP feature gets administratively disabled, SXP MUST send PURGE-ALL message on all the connections on which the local end is a speaker.

Upon receiving a PURGE-ALL message, an SXP listener MUST immediately delete all the bindings that were received on that connection. PURGE-ALL message should have the same effect as an UPDATE message with IPv4-Delete-Prefix or IPv6-Delete-Prefix attributes that contain all bindings which has been exported on the connection. A PURGE-ALL is simply an optimization of such delete at-once case.

4.8. SXP KEEPALIVE Message

There is no payload corresponding to this message type. This message is sent by the speaker to the listener side when KEEPALIVE timer expires.

5. Update Message Handling

An SXP UPDATE message may be received only in the ON state. Receiving an UPDATE message in any other state is an error. When an UPDATE message is received, each field is checked for validity.

5.1. UPDATE Message Validation

UPDATE message SHALL be organized as illustrated below:

   +-------------------------------------------------------------+
   |Zero or more global optional attributes which are unrelated  |
   |to any of the binding delete or add attributes or groups     |
   |             below them                                      |
   +-------------------------------------------------------------+
   |At most one each IPv4-Delete-Prefix or IPv6-Delete-Prefix    |
   +-------------------------------------------------------------+
   |Zero or more Del-IPv4 or Del-IPv6 (non-zero only when neither|
   |  IPv4-Delete-Prefix nor IPv6-Delete-Prefix are present)     |
   +-------------------------------------------------------------+

All errors detected while processing the UPDATE message MUST be indicated by sending the ERROR message with the Error Code UPDATE Message Error to the SXP speaker from which the UPDATE message was received. The Error Sub-code provides additional information about the error. UPDATE message is a sequence of attributes. Attributes could be further classified as compact, compact with extended length, or non-compact attributes. Each attribute consists of a fixed-size header, 3, 4, or 8 octets long respectively, which contains TLV Length field. The total length of each attribute is thus: Attribute Length = 3/4/8 + TLV Length If the length of any attribute is larger than the SXP Message length or the sum of Attribute Length of all attributes is larger than the SXP Message length, the Error Sub-code MUST be set to Malformed Attribute List. If any recognized attribute has Attribute Flags that conflicts with its Attribute Type, then the Error Sub-code MUST be set to Attribute Flags Error. The Data field MUST contain the erroneous attribute (flags, type, length, and value). If any recognized attribute has Attribute Length that conflicts with the expected length (based on its Attribute Type), then the Error Sub-code MUST be set to Attribute Length Error. The Data field MUST contain the erroneous attribute (flags, type, length, and value).

If any recognized attribute does not conform to the attribute specification in UPDATE Attributes section, then Error Sub-code MUST be set to Malformed Attribute. The Data field MUST contain the erroneous attribute (flags, type, length, and value). Cases of malformed attributes:

• IP Prefix length field is outside of the permitted range [0..32]/[0..128] for IPv4/IPv6 respectively.

• Implied length of IP Prefix field which extends the prefix beyond the extent of the containing attribute.

If any attribute appears more than once when at most one occurrence is permitted then the Error Sub-code MUST be set to Malformed Attribute List.

If any attribute appears in a location where it is not expected (such as IPv4-Add-Prefix/IPv6-Add-Prefix without preceding Scalable-Group-Tag) then the Error Sub-code MUST be set to Malformed Attribute List.

If an optional non-transitive attribute is unrecognized, it is quietly ignored.

If an optional transitive attribute is unrecognized, the Partial bit (the third high-order bit) in the attribute flags octet is set to 1, and the attribute is retained for export according to the scope in which the attribute appears. A global attribute is exported along every binding specified in this UPDATE message on all connections for which the local peer is a speaker. A path attribute is exported along the bindings from a single path. A per <path, scalable-group> attribute is exported along bindings received from a single path which share a single Scalable-Group-Tag attribute. If an optional attribute is recognized and has a valid value, then, depending on the type of the optional attribute, it is processed locally, retained, and updated, if necessary, for possible export to listener peers. A Peer-Sequence attributes is checked for syntactic correction. If the path is syntactically incorrect (e.g. length is not a multiple of 4), then an ERROR message MUST be sent to the speaker with Error Sub-code set to Malformed Attribute. SXP listener learns the SXP Node-ID from the SXP Node-ID in the OPEN or OPEN_RESP it receives at connection establishment. This Node-ID is used to validate subsequent Peer-Sequence attributes from the same peer. SXP MUST check whether the leftmost (with respect to the position of octets in the protocol message) SXP Node-ID in the Peer-Sequence is equal to the SXP Node-ID of the peer that sent the UPDATE message. If the checks determines that this is not the case, then an ERROR message MUST be sent to the speaker with Error Sub-code set to Malformed Attribute.

5.2. UPDATE Message processing

Processing of an UPDATE message by an SXP listener follows the organization of the message according to the following high level steps:

  1. UPDATE Message validation as described in section 5.1.
  2. Processing of global optional attributes.
  3. Processing of binding delete attributes. Binding delete attributes include any of IPv4-Del-Prefix, IPv6-Del-Prefix, Del-IPv4, or Del-IPv6 attributes.
  4. Processing path-groups.
    1. Process per-path common optional attributes
    2. Add-Prefix groups. Each Add-Prefix group starts with a Scalable-Group-Tag attribute.
      • i. Process per <path, SGT> optional attributed.
      • ii. Process IPv4-Add-Prefix attribute
      • iii. Process IPv6-Add-Prefix attribute
    3. Processing IPv4-Add-Table attribute
    4. Processing IPv6-Add-Table attribute
    5. Processing Add-IPv4 attributes
    6. Processing Add-IPv6 attributes
  5. Processing trailing optional non-transitive attributes

Each path-group starts with an instance of Peer-Sequence attribute.

Processing Binding Delete Attributes (step 3)

If the UPDATE message contains a non-empty IPv4-Del-Prefix/IPv6-Del-Prefix or one or more Del-IPv4/Del-IPv6 attributes, the previously received bindings from the remote speaker peer, whose IP/IPv6 addresses are contained in any of those attributes, SHALL be removed from the SXP input bindings database. Additional processing of a deleted binding depends on whether it was the SXP contributed binding for its IPv4/IPv6 address and whether it was the selected and exported binding by the master binding data-base:

SXP SHALL select a new contribution for the IPv4/IPv6 address of the deleted binding among the bindings for that address from other peers.

SXP SHALL apply the following rules for selecting a binding for reporting to the master binding database:

  1. Shortest Path rule

Choose a binding which has the shortest Peer-Sequence among all the bindings for the same IP address. A binding received without Peer-Sequence attribute (i.e. from an earlier version speaker) is considered as being received with a Peer-Sequence that contains a single NULL SXP Node-ID (value 0).

  1. Most Recently Received rule

If there are more than one bindings with the shortest Peer-Sequence, the binding which has been most recently received is selected.

If the deleted binding was the only binding for its IP address in the SXP input binding database, SXP SHALL report it to the master binding database. The external outcome when no other contributors to the same IP address is present, is that a deletion of the binding using IPv4-Del-Prefix/IPv6-Del-Prefix or Del-IPv4/Del-IPv6 attribute MUST be exported on all connections on which the local end is a speaker.

If there is one or more additional contributors in the master binding database, a new binding is selected and it MUST be exported using IPv4-Add-Prefix/IPv6-Add-Prefix, IPv4-Add-Table/IPv6-Add-Table, or Add-IPv4/Add-IPv6 and associated with a Peer-Sequence that includes only the SXP Node ID of the local instance.

If the binding selection rule has yielded a newly selected binding from SXP input binding database it SHALL be compared to the binding it is replacing, and

Processing Binding Add Attributes (step 4)

5.3. Generating UPDATE Message

Binding Removal Event

If SXP becomes aware of a deletion of a binding for an IPv4/IPv6 prefix it then processes the event as follows:

Binding Change Event

If SXP becomes aware of a new or changed binding for an IPv4/IPv6 prefix it then processes the event as follows:

SXP SHALL flag an SXP originated binding in its input bindings database in order to support UPDATE message processing as specified in this section.

6. SXP Failure Scenarios

The following SXP failure scenarios are analyzed in this section:

SXP connection can fail for various reasons:

7. SXP Timers

There are 5 SXP timers defined in SXP protocol. The details of the timers and the default values of them are described below:

Timer Default Value (seconds)
Retry Open Timer 120
Delete Hold Down Timer 120
Reconciliation Timer 120
Hold Timer 90 (actual value is negotiated)
Keep-alive Timer 30 (one-third of Hold Timer)

8. SXP Version Negotiation

In a software release, SXP always supports the highest version in that release and any version lower than that. SXP version negotiation is per connection base. On the same device, the connections established between this device and the peer devices may have difference versions, depending on the SXP version running on the peer devices.

If the two ends of a SXP connection run the same SXP version, the running version is used. If the two ends are running different versions, the version of the connection is decided with following process:

When the initiator of a connection sends a SXP OPEN message, the highest supported version is coded in the OPEN message.

On the receiver side, if the version in the OPEN message is higher than the version it can support, it sends back OPEN RESPONSE message with the highest version it supports (which is lower than the version in OPEN message). When the OPEN RESPONSE message is received by the initiator, it accepts the version coded in the OPEN RESPONSE. The receiver side’s version is used as the connection’s version.

In summary, SXP connection picks the highest version that is supported on both sides.

8.1. SXP Versions

SXP supports three versions starting from version 2 to 4. The details of what is supported in each of the version follows -

Ver IPv4 Bindings IPv6 Bindings Subnet Binding Expansion Loop Detection SXP Capability Exchange
2 Yes Yes No No No
3 Yes Yes Yes No No
4 Yes Yes Yes Yes Yes

8.2. Error Handling in Older SXP Versions

The error handling scenarios in SXP Version1 & Version 2 are listed below:

8.2.1. SXP Version 1

8.2.2. SXP Version 2

9. Security Considerations

SXP carries mappings between IP addresses and scalable groups. An adversary that can modify the data can potentially gain an advantage over the system. This specification defines the use of TCP MD5 Signature Option [RFC5925] to provide integrity protection for the information. The key used with TCP MD5[RFC1321] should be chosen to be long and have high entropy to provide as much protection from dictionary attacks as possible.

MD5 has been shown to contain weaknesses so this protection may not be sufficient in many environments. In addition, an adversary who can observe the data may be able to use the information learned to identify weak or highly valued targets so it is desirable that confidentiality also be provided. A future revision of this specification will provide more robust security mechanisms such as ones based on IPSEC or TLS.

10. Implementation Note

Current implementations of the protocol support source and peer IP addresses to be IPv4 only. However, the implementation should take care of being able to support IPv6 addresses as well in the future. As such, there are not changes expected to be done in the protocol itself to support IPv6 address based connections.

The SXP password can be upto 80 ASCII characters.

The Node-id used has to be a unique IP address on the device. Current implementations use the highest local IP address on the device and some use the first IP address on the device.

10.1. Bi-Directional SXP

The current implementation of SXP protocol supports bindings to be exchanged in one direction only per connection. In scenarios where bindings have to be sent and received with the same peer, two connections have to be configured with one as Listener and the other one as Speaker. In order to avoid configuring two different connections with the same peer, the procotol supports bi-directional mode. When a connection is configured in bi-directional mode, on either ends only the Listener initiates the socket connection unlike the normal case where either Speaker/Listener can initiate the connection. Once the two socket connections are up the rest of the connection management, bindings updates, etc are same as the uni-directional connection.

11. IANA Considerations

TBD

12. IPR Disclosure

"By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79."

13. Copyright Notice and Disclaimer

"Copyright (C) The IETF Trust (year). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights." Additional copyright notices are not permitted in IETF Documents except in the case where such document is the product of a joint development effort between the IETF and another standards development organization or the document is a republication of the work of another standards organization. Such exceptions must be approved on an individual basis by the IAB.

"This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY , THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE."

14. Acknowledgements

Special acknowledgements are due to the following contributors. The protocol was originally developed by Michael Smith, Michael Fine and Awais Nemat. Enhancements have been made over the years by Ronen Arad, Rajesh Bhandari, Lei Fu and Paddy Nallur. Darrin Miller and Kevin Regan provided significant inputs into requirements for later versions of the protocol. Fabio Maino specified the data plane format. Sandeep Rao has contributed to the SGT format in NSH, GRE, VxLan and also the section on tagging exemption for L2 control traffic. Thanks to Joe Salowey and Susan Thomson for edits and review.

15. References

15.1. Normative References

[RFC1321] Rivest, R., "The MD5 Message-Digest Algorithm", RFC 1321, DOI 10.17487/RFC1321, April 1992.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997.
[RFC5925] Touch, J., Mankin, A. and R. Bonica, "The TCP Authentication Option", RFC 5925, DOI 10.17487/RFC5925, June 2010.
[RFC5926] Lebovitz, G. and E. Rescorla, "Cryptographic Algorithms for the TCP Authentication Option (TCP-AO)", June 2010.

15.2. Informative References

[I-D.draft-guichard-sfc-nsh-dc-allocation] Guichard, J., Smith, M., Kumar, S., Majee, S., Agarwal, P., Glavin, K. and Y. Laribi, "Network Service Header (NSH) Context Header Allocation (Data Center)", December 2014.
[I-D.draft-quinn-sfc-nsh] Quinn, P., Guichard, J., Surendra, S., Smith, M., Henderickx, W., Nadeau, T., Agarwal, P., Manur, R., Chauhan, A., Majee, S., Elzur, U., Melman, D., Garg, P., McConnell, B., Wright, C. and K. Kevin, "Network Service Header", January 2015.
[I-D.draft-smith-vxlan-group-policy] Guichard, J., Smith, M., Kumar, S., Majee, S., Agarwal, P., Glavin, K. and Y. Laribi, "Network Service Header (NSH) Context Header Allocation (Data Center)", December 2014.

Appendix A. SGT as MetaData in Data Plane

The Scalable Group Tag can be carried in the control plane (using SXP described in the main body of this I-D), or in the data plane. SGT can be carried inline as metadata in Cisco MetaData (CMD) or Network Service Header (NSH). CMD can be carried in L2 or as a GRE payload.

The following sections describe format of SGT metadata in CMD and NSH.

This section describes Cisco Metadata (CMD) Version 1, the format for carrying SGT in the data plane at L2. Appendix B describes SGT in NSH, while C and D describes format of SGT CMD and SGT NSH in GRE.

A.1. CMD Format

The CMD format is comprised of a header and payload. The fields are transmitted in network byte order from left to right.

A.1.1. Metadata Header

 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-++-+-+-+-+-+-+-+-+-+
 | | | | | | | | | | |1|1|1|1|1|1|1|1|1|1|2|2|2|2|2|2|2|2|2|2|3|3|
 |0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-++-+-+-+-+-+-+-+-+-+
 |      Metadata Ethertype       |    Version    |     Length    |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-++-+-+-+-+-+-+-+-+-+
 |                 Metadata Payload (4-12 bytes)                 |
 +                                                               +
 |                            ...                                |
 +                            ...                                +
 |                                                               |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-++-+-+-+-+-+-+-+-+-+                      

Metadata Ethertype:

The Metadata field is 2 bytes and contains the ethertype assigned by IEEE for Cisco Metadata (0x8909).

Version

The Version field is 1 byte, and is set to 1 for CMD Header version 1 described in this document.

Length

The Length field is 1 byte, and specifies the length of the CMD in 4-byte units, not including the first 4 bytes. Valid values are in the range of 1-3.

A.1.2. Metadata Payload

Metadata payload is composed of 1 or more options. Only one option is defined in this document.

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-++-+-+-+-+-+-+-+-+-+
| | | | | | | | | | |1|1|1|1|1|1|1|1|1|1|2|2|2|2|2|2|2|2|2|2|3|3|
|0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-++-+-+-+-+-+-+-+-+-+
|  Len  |      Option Type      |                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               +
|                          Value (2-6 bytes)                    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-++-+-+-+-+-+-+-+-+-+ 

Len

The most significant three bits of the Option are the Option Length, expressed in a multiple of 4 bytes not including the first 4 bytes.

Option Type

After Len, the next 13 bits are the Option Type. For SGT Option, the value is 1.

Value

The Value field is 2-6 bytes of data. For SGT Option, the value is 2 bytes, and contains a 16-bit Scalable Group Tag (SGT). If an Option of Type 1 (SGT Option) is present in the Metadata Payload, it must be included as the first Option in the Payload.

A.1.3. Header Protection

The CMD header does not have a checksum or CRC, and relies on the underlying ethernet CRC for integrity protection.

A.1.4. Header Insertion, Removal, and Relocation

The useful lifespan of a metadata header is expected to be a single zone of authority, such as an enterprise or service provider, where all routers and switches within that zone are under the control of that authority. CMD headers should be inserted upon entrance to a zone of authority or while traversing the zone, and removed just prior to exit from the zone.

A.1.4.1. Insertion

A CMD header can be inserted in a packet at any point during network traversal. Any device wishing to add metadata to a packet should check for a pre-existing header, and add a new header if one is not present.

When used at layer 2, in order to simplify parsing and allow bridging of frames through CMD-unaware bridges, the CMD header shall be inserted after the .1Q tag, if present. If the .1Q tag is not present the CMD header will immediately follow the MAC Source Address. Below is a non-exhaustive list of L2 frames with CMD.

CMD Insertion only:

DA,SA,L3pld -> DA,SA,CMD,L3pld

DA,SA,1Q,L3pld -> DA,SA,1Q,CMD,L3pld

DA,SA,1Q,1Q,L3pld -> DA,SA,1Q,1Q,CMD,L3pld

DA,SA,1Q,MPLS,L3pld -> DA,SA,1Q,CMD,MPLS,L3pld

CMD and MACsec Insertion:

DA,SA,L3pld -> DA,SA,MACsec,CMD,L3pld

DA,SA,1Q,L3pld -> DA,SA,MACsec,1Q,CMD,L3pld

DA,SA,1Q,1Q,L3pld -> DA,SA,MACsec,1Q,1Q,CMD,L3pld

A.1.4.2. Removal

A CMD header should be removed prior to exit from the zone of authority, regardless of contents.

A.2. Assigned Ethernet Type

IEEE has assigned the Ethernet Type Field Number 0x8909 to the CMD protocol.

Appendix B. SGT in Network Services Header (NSH)

NSH Is defined in [I-D.draft-quinn-sfc-nsh]. [I-D.draft-guichard-sfc-nsh-dc-allocation] provides a recommended default allocation for the fixed context headers within NSH. Using the recommended default, SGT is carried as follows in the NSH:

Source Class: Set to SGT associated with source IP address in packet

Destination Class: Set to SGT associated with destination IP address in packet (when known)

Appendix C. SGT CMD in GRE

The format of SGT CMD in GRE (IPv4 Packet) is as shown below:

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-++-+-+-+-+-+-+-+
|  L2 Header | L3 Header, proto=47| GRE Header, PT=0x8909   | 
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-------------------------+
|              CMD                |
+---------------+-----------------+
|     PT=0x800  | Original Packet |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+---+

Appendix D. SGT NSH in GRE

The format of SGT NSH in GRE (IPv4 Packet) is as shown below:

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-++-+-+-+-+-+-+-+
|  L2 Header | L3 Header, proto=47| GRE Header, PT=0x8909   | 
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-------------------------+
|              NSH                |
+---------------+-----------------+
|     PT=0x800  | Original Packet |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+---+

Appendix E. SGT in VXLAN

The VXLAN Group Based Policy (VXLAN-GBP) Extension is defined in [I-D.draft-smith-vxlan-group-policy]. SGT can be carried in VXLAN-GBP Extension header in the Group Policy ID field with the G-bit set as shown below:

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-+-+-+-+-+-+-+-+-+
|G|R|R|R|I|R|R|R|R|D|R|R|A|R|R|R|      Group Policy ID       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-----------+----------------+
|      VXLAN Network Identifier (VNI)       |    Reserved    |
+---------------+-----------------+---------+----------------+

Appendix F. SGT tagging exemption for L2 control traffic

There are many Ethernet L2 control protocols (L2CP) that govern various operational and link states on a device. This includes protocols such as Spanning Tree BPDUs, EPoL, LACP, VTP, PVST, UDLD, LLDP, PAUSE Frames and so on. Typically these protocols have predesignated destination MAC address or ranges. In order to facilitate seamless flow and native processing of these control protocols, insertion of CMD to these protocols SHOULD be exempted.

Authors' Addresses

Michael Smith Cisco Systems 210 West Tasman Drive San Jose, California 95134 United States EMail: michsmit@cisco.com
Rakesh Reddy Kandula Cisco Systems Cessna Business Park, Kadubeesanahalli Village Sarjapur/Marathahalli Outer Ring Road Bangalore, Karnataka 560103 India EMail: krreddy@cisco.com
Syam Appala Cisco Systems 510 McCarthy Blvd. Milpitas, California 95035 United States EMail: syam1@cisco.com