BUPT T. Pan
Internet-Draft M. Gao
Intended status: Informational E. Song
Expires: October 30, 2020 Z. Bian
X. Lin
Beijing University of Posts and Telecommunications
April 28, 2020

In-band Network-Wide Telemetry
draft-tian-bupt-inwt-mechanism-policy-00

Abstract

This document describes INT-path, a cost-effective network-wide telemetry framework based on INT(In-band Network Telemetry), by decoupling the network monitoring system into a routing mechanism and a routing path generation policy. INT-path embed SR(Source Routing) into INT probes to allow specifying the route that the probe packet takes through the network. Above this probing path control mechanism, an Euler trail-based path planning policy is developed to generate non-overlapped INT paths that cover the entire network with a minimum path member, reducing the overall telemetry overhead. INT-path is very suitable for deployment in data center networks thanks to their symmetric topologies.

Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119][RFC8174] when, and only when, they appear in all capitals, as shown here.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on October 30, 2020.

Copyright Notice

Copyright (c) 2020 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.


Table of Contents

1. Introduction

At present, conducting fine-grained, network-wide traffic monitoring plays an increasingly significant role in maintaining large-scale computer networks. It enables fine-grained network-wide visibility to ease the fast detection and localization of network gray failures. It also improves the network traffic load balancing with the prior knowledge of link congestion. The network-wide traffic monitoring can be well applied to all types of networks, especially data center networks, where traffic is highly dynamic and network failures occur silently, while user perception of network latency is expected to be bounded.

In traditional network monitoring, management protocols, such as SNMP(Simple Network Management Protocol)[RFC1157]are coarse-grained and involve a large device query latency due to the constant interaction between the control plane and the data plane. To ameliorate the performance issue, INT is proposed to achieve fine-grained network monitoring. INT allows packets to query device-internal states such as queue depth, queuing latency when they pass through the data plane pipeline, without requiring additional intervention from the control plane CPU. At the last hop of the path, the packet containing the end-to-end monitoring data can be sent to a remote central controller for data analysis. INT can print device-internal states to either specified flows of traffic or additionally introduced probe packets. In this document, our proposal relies on INT’s probe packet mode.

INT is essentially an underlying primitive that need the support of special hardware for device-internal state exposure. To achieve network-wide traffic monitoring, INT further requires a high-level orchestration built upon it. In our proposal, multiple controllable probing paths are generated to monitor the entire network. To reduce the telemetry overhead of additional probe packets for the original network, such orchestration should be better follow the following design principles:

This document addresses the problem of "In-band Network-wide Telemetry", and proposes INT-path, a telemetry framework to achieve lightweight network-wide traffic monitoring. Specifically, we embed SR into INT probes to allow specifying the route the probe packet takes through the network. Based on the routing mechanism, we design two path planning policies to generate multiple non-overlapped INT paths that cover the entire network. The first is based on DFS (Depth-First Search) which is straightforward but computationally-efficient. The second is an Euler trail-based algorithm that can optimally generate non-overlapped INT paths with a minimum path number.

2. Problem Statement

This document proposes INT-path to solve the following technical challenges of building a network-wide telemetry system based on INT:

To tackle these problems, this document proposes INT-path, a framework for network-wide telemetry, by decoupling the system into a routing mechanism and a route generation policy. The underlying mechanism allows network operators or INT agents to specify a particular path for monitoring, addressing the uncontrollable path issue (see section3 for details). The policy built upon the mechanism generates multiple INT paths to cover the entire network and a good policy is expected to minimize the telemetry overhead with the least path overlapping as well as the minimized total path number(see section4 and section5 for details).

3. Source Routing-based Path Monitoring (Mechanism)

This document addresses the uncontrollable path issue via the technique of SR. Figure 1 shows the SR-based telemetry architecture as well as the probe packet format. Although SR is not a new technique, we innovate to couple it with the INT probe using the P4 language [P4] to implement user-specified or on demand path monitoring.

            |<--   Variable length   -->|
            | 22B  |
            +------+------+------+------+
            | INTn | .... | INT2 | INT1 | 
            +------+------+------+------+
             \                         /
                \                    /
                  \               /
                     \         /
+-------+------+------+-------+-------+
|  ETH  |  IP  |  SR  |  INT  |  UDP  | 
+-------+------+------+-------+-------+
              /        \
           /              \
        /                    \ 
     /                          \
   +-------+------+-------+-------+
   | Portn | .... | Port2 | Port1 | 
   +-------+------+-------+-------+
   |  4b   |
   |<----------  512b   --------->|
    

Figure 1. Probe packet format

3.1. Packet Header Format

Theoretically, the SR label stack and the INT label stack can be placed above either the IP header or the UDP header. If placed above the UDP header, they need to occupy an extra port number, which is likely to conflict with the port number chosen by the end hosts for a certain application. While, if placed above the IP header, they only occupy an IP protocol number, and then we can choose an unused protocol number according to existing RFC specifications. Therefore, the program chosen by this document is to place the SR label stack and the INT label stack above the IP header as shown in Figure1. One thing to declare is that although we design a customized header format as follows for the probe packets, the network devices can still correctly forward these packets provided that protocol-independent forwarding is supported.

3.2. Forwarding Behaviors

In the SR-based telemetry architecture, three types of logic routers are proposed with different functionalities: the INT generator, the INT forwarder and the INT collector(as shown in Figure 2).

+--------+                         +--------+ 
|End Host|                         |End Host|
+--------+                         +--------+ 
     |                                 |
     |                                 |
+---------+                       +---------+
|   INT   |                       |   INT   |
|generator|                       |collector|
+---------+                       +---------+ 
 |   |           +---------+           |   |
 |   |           |   INT   |           |   |
 |   |           |forwarder|           |   |
 |   |           +---------+           |   |
 |   |          /           \          |   |
 |   |         /             \         |   |
 |  +---------+               +---------+  |
 |  |   INT   |               |   INT   |  |
 |  |forwarder|               |forwarder|  |
 |  +---------+               +---------+  |
 |             \             /             |
 |              \           /              |
 |               +---------+               |
 |               |   INT   |               |
 |               |forwarder|               |
 |               +---------+               |
 |                                         |
+-------------------------------------------+                        
|                 Controller                |                       
+-------------------------------------------+ 
        

Figure 2. Source routing-based path monitoring

4. DFS-based path planning algorithm

4.1. Algorithm Outline

In this section, we propose a simple algorithm based on DFS.

When traversing a tree or a graph, DFS starts at the root and explores as far as possible along each branch before backtracking. The basic idea of the DFS-based path planning algorithm is to consecutively add the visited vertices into the current path before backtracking; if we have nowhere to go and have to backtrack, we just create a new path and add the fork vertex (the first vertex along the backtracking path that has unvisited edges) as the first node of the new path. After all the edges are visited in the DFS order, we can extract multiple non-overlapped paths covering the entire graph.

A recursive version of the DFS-based path planning algorithm is as follows:

4.2. Example

We show an algorithm example on a network graph of five devices as shown in Figure 3.

+----+     +----+     +----+
| v0 |-----| v1 |-----| v2 |
+----+     +-+--+     +--+-+
             |       /   |
             |      /    |
             |     /     |
             |    /      |
             |   /       |
           +-+--+     +--+-+ 
           | v3 |-----| v4 |
           +----+     +----+
    

Figure 3. Depth-first graph traversal

At the beginning, v0 is pushed into the call stack, path1 = {v0, v1} and the edge between v0 and v1 is marked as visited. The path1 expands as more and more vertices are visited in the DFS order. When path1 expands to {v0, v1, v2, v3, v1}, we have nowhere to go and have to find the fork vertex. At this time, we pop v1 from the stack and return to v3, which has an unvisited edge to v4 (notice that the call stack push/pop operations are implicitly performed during recursion). Then, we identify v3 as the fork vertex because it is the first vertex along the backtracking path that has unvisited edges. Based on v3, we create a new path as path2 = {v3, v4}. When path2 expands to {v3, v4, v2}, we have again nowhere to go and have to backtrack. But at this time, although we check all the vertices popped from the call stack, we still cannot find any fork vertex. The recursion halts when the call stack finally becomes empty. At last, we extract two non-overlapped INT probing paths (i.e., path1 and path2).

5. Euler trail-based path planning algorithm

5.1. Algorithm Outline

Although the DFS-based path planning algorithm is computationally-efficient, it has no guarantee to minimize the number of generated paths, which will potentially increase the telemetry overhead, especially the telemetry workload at the centralized. Here, we propose an optimal path planning algorithm taking advantage of the mathematical properties of the Euler trail/circuit.

In fact, the mathematical properties of the Euler trail/circuit already indicated the theoretical value of the minimum non-overlapped path number for covering a given graph. To achieve the theoretical minimum, each extracted path from a graph should start from one odd vertex and end at another odd vertex. In other words, removing one such path from a graph will eliminate a pair of odd vertices from that graph. According to the above observation, we devise an Euler trail-based algorithm to iteratively extract a path between a pair of odd vertices until all the vertices/edges are extracted from the original graph.

Although the algorithm sounds rather straightforward as an iterative path extraction process, the devil lies in the detail of dealing with several boundary cases. To be more specific, the devil lies in the possibility that an extracted path can split one connected graph into multiple subgraphs, which definitely complicates the iterative path extraction process.

Next we explain the optimal algorithm in detail. We use G to represent the network graph, which is initialized to be one connected graph and may also become multiple disconnected subgraphs caused by path extraction during algorithm iteration. We use Q to represent the path set which is initialized as an empty set and will finally contain the generated non-overlapped INT paths. We use G-p to represent extracting a path p from the graph G which will possibly further split the graph(s) G into more subgraphs.

Actually, without considering the complexity of graph split, for a given connected graph, there are mainly three different cases for path extraction. We propose solutions in each of these three cases as follows:

5.2. Example

Figure 4 shows a path extraction process of the Euler trail-based algorithm. At the start, there is only one connected graph G1 {1,2,3,4,5,6,7} with 4 odd vertices. Since the number of the odd vertices is larger than 2, we randomly choose two odd vertices (1 and 3), extract a path 1-4-3 from G1 and insert the path into Q. The above path extraction behavior will split the original G1 into two subgraphs G1 {1,2,3} and G2 {4,5,6,7}. Since G1 has no odd vertex and set Q is not empty, we paste the path 1-4-3 in Q with the Euler circuit 1-2-3-1 generated from G1 to create a new path 1-2-3-1-4-3. The new path will replace the original path 1-4-3 in Q. After the path paste, there is only one graph G1 {4,5,6,7} with 2 odd vertices. We use the Hierholzer’s algorithm to find its Euler trail 5-4-6-5-7-6 as the second INT path in Q. The algorithm halts after all the paths are extracted.

  +----+3+----+   +----+6+----+
 /      +      \ /      +      \ 
+       |       +       |       +
2       |       4       |       7 
+       |       +       |       +
 \      +      / \      +      / 
  +----+1+----+   +----+5+----+
               (a)
       G1={1,2,3,4,5,6,7};
        S={G1}, T=Empty;
          odd_num=4>2;
           Q={1-4-3};			   

  +----+3         +----+6+----+
 /      +        /      +      \ 
+       |       +       |       +
2       |       4       |       7
+       |       +       |       +
 \      +        \      +      /
  +----+1         +----+5+----+
               (b)
     G1={1,2,3},G2={4,5,6,7};
        S={G1,G2}, T={G1};
   T_path=1-4-3,T_circuit=1-2-3-1;
        path=1-2-3-1-4-3;
         Q={1-2-3-1-4-3};

          +----+6+----+
         /      +      \ 
        +       |       + 
        4       |       7
        +       |       + 
         \      +      /
          +----+5+----+
               (c)
          G1={4,5,6,7};
         S={G2}, T=Empty;
           odd_num=2;
    Q={1-2-3-1-4-3,5-4-6-5-7-6};
    

Figure 4. Path extraction process of the Euler trail-based algorithm

6. IANA Considerations

This document introduces no new security issues.

7. Security Considerations

This document makes no request of IANA.

8. References

8.1. Normative References

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997.
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017.

8.2. Informative References

[Hierholzer] Hierholzer, H. and W. Wiener, "Ueber die Möglichkeit, einen Linienzug ohne Wiederholung und ohne Unterbrechung zu umfahren", March 1873.
[P4] Bosshart, P., Daly, D., Gibb, G., Izzard, M., McKeown, N., Rexford, J., Schlesinger, C. and D. Talayco, "P4: programming protocol-independent packet processors", July 2014.
[RFC1157] Case, J., Fedor, M., Schoffstall, M. and J. Davin, "Simple Network Management Protocol (SNMP)", RFC 1157, DOI 10.17487/RFC1157, May 1990.

Authors' Addresses

Tian Pan Beijing University of Posts and Telecommunications Beijing, China EMail: pan@bupt.edu.cn
Minglan Gao Beijing University of Posts and Telecommunications China EMail: gml@bupt.edu.cn
Enge Song Beijing University of Posts and Telecommunications China EMail: songenge@bupt.edu.cn
Zizheng Bian Beijing University of Posts and Telecommunications China EMail: zizheng_bian@bupt.edu.cn
Xingchen Lin Beijing University of Posts and Telecommunications China EMail: linxchen3907@163.com