Internet-Draft TTE May 2023
Barth, et al. Expires 4 November 2023 [Page]
Workgroup:
RTGWG Working Group
Internet-Draft:
draft-li-rtgwg-tte-01
Published:
Intended Status:
Informational
Expires:
Authors:
C. Barth
Juniper Networks
T. Li
Juniper Networks
A. Smith
Comcast
B. Wen
Comcast
L. Jalil
Verizon

Tactical Traffic Engineering (TTE)

Abstract

Conventional traffic engineering approaches for resource management used by RSVP-TE and SR-TE often leverage estimates of the ingress traffic demands, during path placement. Unforeseen and/or dynamic events, can skew these estimates by significant enough margins to result in unexpected network congestion. Recomputed paths that address the new demands may take a considerable amount of time, leaving the network in a sub-optimal state for far too long.

This document proposes one mechanism that can avert congestion on a real-time basis.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 4 November 2023.

Table of Contents

1. Introduction

Conventional traffic engineering approaches for resource management used by RSVP-TE [RFC3209] and SR-TE [RFC8402] often leverage estimates of the ingress traffic demands, during path placement. Unforeseen and/or dynamic events, can skew these estimates by significant enough margins to result in unexpected network congestion. Recomputed paths that address the new demands may take a considerable amount of time, leaving the network in a sub-optimal state for far too long.

Ideally, the network should be able to react to unforeseen congestion events and attempt to distribute load automatically, avoiding as much congestion as possible. Such mechanisms should be usable to compliment the conventional traffic engineering approaches.

1.1. Requirement Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

2. Real-Time Congestion

Various economic factors of operating real-world networks at scale require network operators to run their networks at relatively high utilizations. While legacy shortest-path routing is helpful in basic path placement, it does not consider the actual traffic demands on the network, resulting in highly utilized paths that can quickly become congested.

To address this, network operators have adopted various traffic engineering (TE) techniques whereby an input to the path placement for traffic engineering tunnels utilizes a bandwidth prediction and/or a demand matrix, with bandwidth requirements for the major sources and destinations in the network. These predictions are typically estimates based on historical telemetry and capture network and/or TE tunnel usage.

In a more sophisticated view of network demand, a bandwidth requirement can be more accurately viewed as a function over time, with the demand waxing and waning, frequently in a diurnal pattern driven by human interaction. Periodic demands may be driven by complex processes, sometimes scheduled, sometimes in reaction to external events. The salient point is that the elements in the demand matrix are best regarded as time series estimates. As a result, a traffic engineering solution is a set of paths that may themselves vary over time, requiring a complex optimization not only for a static demand but at several points along the time series.

This problem is further compounded by the dynamic changes to demand that were not anticipated in the estimates. Traffic demands may spike or ebb without warning. A singer may announce a concert, causing an unexpected demand as the ticket vendor receives a wave of traffic. Events on social media can cause an unexpected storm of traffic to the media's servers. New product announcements can cause streaming sites to become overloaded.

Network resources are also not always consistently predictable. BGP next-hops change, links fail, hardware fails, and software fails. While traffic engineering tools can do contingency analysis and creating protection paths that should mitigate potential congestion, this is typically only done for single failures. In the rare event of multiple failures, traffic engineering would be forced to completely recalculate a solution, which might not be available for hours.

All of this leaves network operators in the very uncomfortable situation that in unforeseen circumstances, they may find themselves with a congested network, unable to meet Service Level Objectives (SLOs) and potentially in violation of Service Level Agreements (SLAs). Even more confounding is that this situation could happen even if there is sufficient capacity in the network to support the actual demand, but it cannot be implemented until a global optimization computation completes.

3. Real-Time Tactical Traffic Engineering

To address this issue, we propose an alternate strategy: real-time tactical traffic engineering (TTE). This would be a set of mechanisms that would allow the network to react in real-time to avert congestion and optimize traffic flow. This would work in conjunction with traditional traffic engineering bandwidth management techniques, alleviating problems while optimal traffic assignment is being recomputed.

Real-time traffic engineering is a practical approach to a thorny problem: full blown optimality is very hard and can take a long time. A network that can organically, dynamically, and quickly adapt may provide a significant performance improvement while optimal traffic engineering path placement is in-progress.

TTE allows the network to dynamically distribute load if congestion is anticipated. The goal is to simply shift load away from congested links and then, if the link congestion abates, shift traffic back.

If traffic is on an alternate path, then it has the potential to create congestion elsewhere in the network. Bringing the traffic back to its original path causes the network to be more aligned with its original, near-optimal traffic pattern.

3.1. Recognizing Congestion

For TTE to operate effectively, it needs to understand when a link is nearing congestion and conversely, when congestion has abated. Each link that is protected by TTE is sampled periodically for its current utilization. The boundaries of acceptable utilization are defined by high and low utilization thresholds. To avoid oscillation, the link must be outside of acceptable utilization for some consecutive number of periodic samples before any action is performed. Further, the high and low thresholds need to be sufficiently far apart such that small load traffic changes will not cause a shift from high to low or low to high.

3.2. Flows

TTE manipulates traffic flows by changing the IPv4 or IPv6 prefixes found in the Forwarding Information Base (FIB), or by changing label entries found the Label Forwarding Information Base (LFIB). For the purposes of this document, both IP prefixes and labels constitute 'flows'. A flow may have one or more paths to its destination.

3.3. Backup Paths

A number of mechanisms exist that potentially create backup paths for a single flow. Typically, these backup paths are used in case of a failure of the original path and allow for a very rapid transfer of traffic from the failed link to the backup path. For this rapid transfer to work, the backup paths are placed in the forwarding table and then marked as being in a backup and inactive state.

The key properties of a backup path are that it cannot cause forwarding loops and that it avoids the same link that the primary path is using. TTE makes use of backup paths by turning them into active paths in parallel with the primary path. This creates an Equal Cost Multi-Path (ECMP) situation where some traffic will be forwarded down each of the active paths. In most implementations, ECMP is implemented by hashing of the traffic, so each path will have roughly an equal share of the traffic, however, statistical anomalies are not impossible.

Potential backup paths can be computed by several mechanisms:

  • TI-LFA, RLFA: Paths computed by Topology Independent Loop-free Alternates (TI-LFA) [I-D.ietf-rtgwg-segment-routing-ti-lfa] and Remote Loop-Free Alternate (RLFA) [RFC7490] can be used as backup paths.
  • Fast Reroute: Paths computed by RSVP Fast Reroute [RFC4090] can be used as backup paths.
  • Unequal Cost Multi-Path: Interior Gateway Protocols (IGPs) that compute Unequal Cost Multi-Path (UCMP) paths can be used as backup paths.
  • PCE: Paths computed by a Path Computation Element (PCE) [RFC5440] can be used as backup paths.

3.4. Activation and Deactivation

When TTE selects a flow and makes appropriate data plane changes so that traffic is balanced between the primary path(s) and the backup path(s), we say that the flow is 'active'. Similarly, when TTE shifts traffic away from its backup path(s) back to the primary path(s), we say that the flow is 'inactive' or 'deactivated'.

3.5. Downstream Congestion

In a carefully traffic engineered network, any change to the traffic flow may have an impact in multiple places on the network. When a flow is activated, it may shift traffic to an entirely different path, not just around a single link, and the change may result in congestion along the new path. Networks that are engineered to support protection against link failures should already take this into account. However, even this backup capacity can be saturated if TTE activates enough flows on a variety of links. If TTE is also deployed along the backup path, it may be triggered to activate further flows, further distributing the traffic load.

3.6. Flow Distribution

TTE is necessarily stochastic. Since we cannot predict flow utilization (and thus link utilization) with absolute certainty, any flow selection is necessarily an estimate of future behavior. TTE assumes that the flows on the link have a typical Gaussian distribution and that there not many 'elephant' flows that have requirements far above the mean and not many 'mice' flows that have requirements far below the mean. TTE also assumes that there are enough flows that the Law of Large Numbers applies. Our simulation results suggest that even 50 flows with a good Gaussian distribution is ample to meet this requirement.

3.7. Flow Lifetimes

Ideally, TTE would only manipulate long-lived flows, as activating a short-lived flow would be ineffective at redirecting bandwidth. However, knowing the lifetime of any specific traffic stream is effectively impossible and the lifetime of an aggregated flow is also unknowable. Historical or policy information could be added to the approaches described here, and this is a matter for further study.

3.8. Flow Selection

When a link is outside of its bandwidth thresholds, TTE must select certain paths to activate or deactivate. TTE can select one or more paths from one or more flows. Which paths and flows to select is a critical decision that strongly affects how quickly TTE converges to a solution where the link bandwidth is within its thresholds. Ideally, TTE selects the right paths and flows to activate to create a 'working set' that avoids congestion.

When a link is above its high threshold, then the set of candidate flows for activation are those flows on the link that have inactive paths. Similarly, if a link is below its low threshold, then the set of candidate flows are those that have a backup path that has been previously activated. The task of flow selection is to choose the set of candidates to activate or deactivate to bring the link back within its bandwidth thresholds.

In the following sections, we discuss several different possible strategies for flow selection. There are a variety of trade offs between these strategies and the selection of a specific strategy is implementation dependent. Many more strategies are possible.

An implementation may employ policy to restrict the set of flows that may be activated by TTE.

3.8.1. Random Selection

One approach to flow selection is to randomly select a flow from the set of candidate flows. This strategy has a significant advantage in that it does not require any tracking of per-flow bandwidth, and thus has minimal state and overhead requirements. The disadvantage of this approach is that it may not converge very rapidly. If the discrepancy between current bandwidth and target bandwidth is large, it may take many iterations and many flows may have to be activated before sufficient bandwidth has been moved.

In this strategy, overshoot is a distinct possibility. That is, a flow that is selected and activated/deactivated may shift more bandwidth than is required and may cause the link to shift from one extreme to another. However, this is not seriously problematic. Subsequent iterations will correct this and shift bandwidth back. Since each flow selection is an independent event, the odds of continually selecting the same flow is inconsequential, and thus, the odds of persistent oscillation are minimal.

More sophisticated strategies are possible, but do require that we track the bandwidth utilization of each flow.

3.8.2. No Elephants Selection

An alternate selection strategy is to try to select a set of flows that will shift the link bandwidth utilization from its current level to a more comfortable level. We define the 'target bandwidth' as the average of the high and low bandwidth thresholds. The objective of flow selection is then to select flows that, in aggregate, amount to the difference between the target bandwidth and the current bandwidth. We call this the 'target change'.

Flows with a bandwidth bigger than the target change are then effectively elephant flows and should be discarded from the candidate pool. Flows are iteratively randomly selected from the remaining pool until the target change is satisfied.

Intuitively, this seems like an effective strategy, but our simulations indicate that it has poor performance. In some situations, there are simply not enough candidates in the pool to meet the target change. As a result, this strategy may sometimes not converge to a solution. From this, we observe that it may sometimes be best to intentionally overshoot the target by selecting an elephant and then converge by an opposing selection of other flows in the next iteration.

3.8.3. Maximum Fit

A similar strategy is to first exclude elephant flows and then select the largest remaining candidates to meet the target change. This has the benefit that it moves fewer flows than purely random selection. It still suffers from not selecting elephant flows if one is necessary.

Moving fewer flows is beneficial because it is less disruptive to network traffic. Every time a flow is moved, transport protocols may detect a change in latency and thus a change in round-trip time (RTT). This may be misinterpreted as a congestion event and lead to congestion avoidance. It would be beneficial to minimize this impact.

3.8.4. Minimum Fit

The opposite strategy is to first exclude elephant flows and then select the smallest remaining candidates. This has the unfortunate issue of moving the maximum number of flows and retains the problem of not moving elephants when they are necessary.

3.8.5. Best Fit

Analogies to memory allocation problems suggest that selecting the set of candidate flows that most closely total the target change would be a possible strategy. Unfortunately, the number of possibilities is the power set of the number of candidate flows. This can quickly explode and become computationally intractable. This strategy was not simulated.

3.8.6. Maximum Fit with Elephants

This strategy is a derivative of the maximum fit strategy. In this strategy, if the target change cannot be satisfied by selecting all of the non-elephant candidates, then the smallest elephant is selected instead. This strategy showed excellent results.

4. Contributors

The following people have substantially contributed to this document:

Tarek Saad, Hari Kotni, Minto Jeyananth, Raj Chetan Boddireddy, Shraddha Hegde

The authors would like to thank Jim Uttaro for his comments.

5. IANA Considerations

This document makes no requests of IANA.

6. Security Considerations

Tactical Traffic Engineering is a mechanism that shifts traffic across pre-computed paths. It introduces no new protocols and operates completely locally on a single system. As such, it creates no new security issues.

7. Normative References

[I-D.ietf-rtgwg-segment-routing-ti-lfa]
Litkowski, S., Bashandy, A., Filsfils, C., Francois, P., Decraene, B., and D. Voyer, "Topology Independent Fast Reroute using Segment Routing", Work in Progress, Internet-Draft, draft-ietf-rtgwg-segment-routing-ti-lfa-10, , <https://datatracker.ietf.org/doc/html/draft-ietf-rtgwg-segment-routing-ti-lfa-10>.
[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/info/rfc2119>.
[RFC3209]
Awduche, D., Berger, L., Gan, D., Li, T., Srinivasan, V., and G. Swallow, "RSVP-TE: Extensions to RSVP for LSP Tunnels", RFC 3209, DOI 10.17487/RFC3209, , <https://www.rfc-editor.org/info/rfc3209>.
[RFC4090]
Pan, P., Ed., Swallow, G., Ed., and A. Atlas, Ed., "Fast Reroute Extensions to RSVP-TE for LSP Tunnels", RFC 4090, DOI 10.17487/RFC4090, , <https://www.rfc-editor.org/info/rfc4090>.
[RFC5440]
Vasseur, JP., Ed. and JL. Le Roux, Ed., "Path Computation Element (PCE) Communication Protocol (PCEP)", RFC 5440, DOI 10.17487/RFC5440, , <https://www.rfc-editor.org/info/rfc5440>.
[RFC7490]
Bryant, S., Filsfils, C., Previdi, S., Shand, M., and N. So, "Remote Loop-Free Alternate (LFA) Fast Reroute (FRR)", RFC 7490, DOI 10.17487/RFC7490, , <https://www.rfc-editor.org/info/rfc7490>.
[RFC8174]
Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, , <https://www.rfc-editor.org/info/rfc8174>.
[RFC8402]
Filsfils, C., Ed., Previdi, S., Ed., Ginsberg, L., Decraene, B., Litkowski, S., and R. Shakir, "Segment Routing Architecture", RFC 8402, DOI 10.17487/RFC8402, , <https://www.rfc-editor.org/info/rfc8402>.

Authors' Addresses

Colby Barth
Juniper Networks
1133 Innovation Way
Sunnyvale, CA 94089
United States of America
Tony Li
Juniper Networks
1133 Innovation Way
Sunnyvale, CA 94089
United States of America
Andy Smith
Comcast
1800 Arch St.
Philadelphia, PA 19103
United States of America
Bin Wen
Comcast
1800 Arch St.
Philadelphia, PA 19103
United States of America
Luay Jalil
Verizon
400 International Pkwy
Richardson, TX 75081
United States of America