Internet-Draft ALTO Performance Cost Metrics July 2021
Wu, et al. Expires 27 January 2022 [Page]
Workgroup:
ALTO Working Group
Internet-Draft:
draft-ietf-alto-performance-metrics-17
Published:
Intended Status:
Standards Track
Expires:
Authors:
Q. Wu
Huawei
Y. Yang
Yale University
Y. Lee
Samsung
D. Dhody
Huawei
S. Randriamasy
Nokia Bell Labs
L. Contreras
Telefonica

ALTO Performance Cost Metrics

Abstract

Cost metric is a basic concept in Application-Layer Traffic Optimization (ALTO), and different applications may use different cost metrics. Since the ALTO base protocol (RFC 7285) defines only a single cost metric (i.e., the generic "routingcost" metric), if an application wants to issue a cost map or an endpoint cost request to determine the resource provider that offers better delay performance, the base protocol does not define the cost metric to be used.

This document addresses the issue by introducing network performance metrics, including network delay, jitter, packet loss rate, hop count, and bandwidth.

There are multiple sources (e.g., estimation based on measurements or service-level agreement) to derive a performance metric. This document introduces an additional "cost-context" field to the ALTO "cost-type" field to convey the source of a performance metric.

Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119][RFC8174] when, and only when, they appear in all capitals, as shown here.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 27 January 2022.

Table of Contents

1. Introduction

Application-Layer Traffic Optimization (ALTO) provides a means for network applications to obtain network status information so that the applications can identify efficient application-layer traffic patterns using the networks. Cost Metric is a basic concept in realizing ALTO, and the concept is used in both the ALTO cost map service and the ALTO endpoint cost service in the ALTO base protocol [RFC7285].

Since different applications may use different cost metrics, the ALTO base protocol introduces an ALTO Cost Metric Registry (Section 14.2 of [RFC7285]), as a systematic mechanism to allow different metrics to be specified. For example, a delay-sensitive application may want to use latency related metrics, and a bandwidth-sensitive application may want to use bandwidth related metrics. However, the ALTO base protocol has registered only a single cost metric, i.e., the generic "routingcost" metric (see Sec. 14.2 of [RFC7285]); no latency or bandwidth related metrics are defined.

This document registers a set of new cost metrics specified in Table 1, to allow applications to determine "where" to connect based on network performance criteria such as delay and bandwidth related metrics. This document follows the guideline defined in Section 14.2 of the ALTO base protocol [RFC7285]) on registering ALTO cost metrics. Hence it specifies the identifier, the intended semantics, and the security considerations of each one of the metrics defined in Table 1.

+--------------------------+-------------+-------------------+
| Metric                   | Definition  |  Origin Example   |
+--------------------------+-------------+-------------------+
| One-way Delay            | Section 3.1 | [RFC7679]         |
| Round-trip Delay         | Section 3.2 | [RFC2681]         |
| Delay Variation          | Section 3.3 | [RFC3393]         |
| Hop Count                | Section 3.4 | [RFC7285]         |
| Loss Rate                | Section 3.5 | [RFC7680]         |
|                          |             |                   |
| TCP Throughput           | Section 4.1 | [RFC6349]         |
| Residual Bandwidth       | Section 4.2 | [RFC8570]         |
| Max Reservable Bandwidth | Section 4.3 | [RFC5305]         |
+------------+-----------------------------------------------+
   Table 1. Cost Metrics Defined in this Document.

The purpose of this document is to ensure proper usage of the performance metrics defined in Table 1; it does not claim novelty of the metrics. The "Origin Example" column of Table 1 gives an example RFC that has defined each metric.

The performance metrics can be classified into two categories: those derived from the performance of individual packets (i.e., one-way delay, round-trip delay, delay variation, hop count, and loss rate), and those related with bandwidth (TCP throughput, residual bandwidth, and maximum reservable bandwidth). These two categories are defined in Sections 3 and 4 respectively. Note that all metrics except round trip delay in Table 1 are unidirectional; hence, a client will need to query both directions if needed.

An ALTO server may provide only a subset of the metrics described in this document. For example, those that are subject to privacy concerns should not be provided to unauthorized ALTO clients. Hence, all cost metrics defined in this document are optional and not all of them need to be exposed to a given application. When an ALTO server supports a cost metric defined in this document, it should announce this metric in its information resource directory (IRD).

[RFC7285] specifies that cost values should be assumed by default as JSONNumber. When defining the value representation of each metric in Table 1, this document conforms to this specification, but specifies additional, generic constraints on valid JSONNumbers for each metric. For example, each metric in Table 1 will be specified as non-negative (>= 0); Hop Count is specified to be an integer.

An ALTO server introducing these metrics should consider security issues. As a generic security consideration on the reliability and trust in the exposed metric values, applications SHOULD rapidly give up using ALTO-based guidance if they detect that the exposed information does not preserve their performance level or even degrades it. This document discusses security considerations in more detail in Section 6.

Following the ALTO base protocol, this document uses JSON to specify the value type of each defined metric. See [RFC8259] for JSON data type specification.

2. Performance Metric Attributes

When defining the metrics in Table 1, this document considers the guideline specified in [RFC6390], which requires that the fine-grained specification of a network performance metric include 6 components: (i) Metric Name, (ii) Metric Description, (iii) Method of Measurement or Calculation, (iv) Units of Measurement, (v) Measurement Points, and (vi) Measurement Timing. Requiring that an ALTO server provide precise, fine-grained values for all 6 components for each metric that it exposes may not be feasible or necessary for all ALTO use cases. For example, the method of measurement or calculation can be complex with substantial details that cannot be exposed to or are unnecessary for ALTO clients in many use cases.

To address the issue and realize ALTO use cases, for metrics in Table 1, this document defines performance metric identifiers which can be used in the ALTO protocol with well-defined (i) Metric Name, (ii) Metric Description, (iv) Units of Measurement, and (v) Measurement Points, which are always specified by the specific ALTO services; for example, endpoint cost service is between the two endpoints. We say that the ALTO performance metric identifiers provide basic metric attributes.

To allow the flexibility of allowing an ALTO server to provide fine-grained information such as Method of Measurement or Calculation, according to its policy and use cases, this document introduces context information so that the server can provide these additional details.

2.1. Performance Metric Context: cost-context

The core additional details of a performance metric specify "how" the metric is obtained. This is referred to as the source of the metric. Specifically, this document defines three types of coarse-grained metric information sources: "nominal", and "sla" (service level agreement), and "estimation".

For a given type of source, precise interpretation of a performance metric value can depend on particular measurement and computation parameters. For example, see Section 3.8 of [RFC7679] on items that a more complete measurement-based report should include.

To make it possible to specify the source and the aforementioned parameters, this document introduces an optional "cost-context" field to the "cost-type" field defined by the ALTO base protocol (Section 10.7 of [RFC7285]) as the following:


    object {
      CostMetric   cost-metric;
      CostMode     cost-mode;
      [CostContext cost-context;]
      [JSONString  description;]
    } CostType;

    object {
      JSONString    cost-source;
      [JSONValue    parameters;]
    } CostContext;


The "cost-source" field of the "cost-context" field MUST be one of three category values: "nominal", "sla", and "estimation". "cost-context" will not be used as a key to distinguish among performance metrics. Hence, an ALTO information resource MUST NOT announce multiple CostType with the same "cost-metric" and "cost-mode". They must be placed into different information resources.

The "nominal" category indicates that the metric value is statically configured by the underlying devices. Not all metrics have reasonable "nominal" values. For example, throughput can have a nominal value, which indicates the configured transmission rate of the devices; latency typically does not have a nominal value.

The "sla" category indicates that the metric value is derived from some commitment which this document refers to as service-level agreement (SLA). Some operators also use terms such as "target" or "committed" values. For an "sla" metric, it is RECOMMENDED that the "parameters" field provides a link to the SLA definition.

The "estimation" category indicates that the metric value is computed through an estimation process. An ALTO server may compute "estimation" values by retrieving and/or aggregating information from routing protocols (e.g., [RFC8571]) and traffic measurement management tools (e.g., TWAMP [RFC5357]), with corresponding operational issues. A potential architecture on estimating these metrics is shown in Figure 1 below. Section 5 will discuss in more detail the operational issues and how a network may address them.

  +--------+   +--------+  +--------+
  | Client |   | Client |  | Client |
  +----^---+   +---^----+  +---^----+
       |           |           |
       +-----------|-----------+
      North-Bound  |ALTO protocol
    Interface (NBI)|
                   |
                +--+-----+  retrieval      +-----------+
                |  ALTO  |<----------------| Routing   |
                | Server |  and aggregation|           |
                |        |<-------------+  | Protocols |
                +--------+              |  +----------+
                                        |
                                        |  +-----------+
                                        |  |Management |
                                        ---|           |
                                           |  Tool     |
                                           +-----------+
Figure 1. A framework to compute estimation to performance metrics

There can be multiple choices in deciding the cost-source category. It is the operator of an ALTO server who chooses the category. If a metric does not include a "cost-source" value, the application MUST assume that the value of "cost-source" is the most generic "estimation".

2.2. Performance Metric Statistics

The measurement of a performance metric often yields a set of samples from an observation distribution ([Prometheus]), instead of a single value. This document considers that the samples are aggregated as a statistic when reported. Hence, each performance metric's identifier should indicate the statistic (i.e., an aggregation operation), to become


  <metric-identifier> ::= <metric-base-identifier> [ '-' <stat> ]

where <stat> MUST be one of the following:

percentile, with letter 'p' followed by a number:

gives the p percentile. Specifically, consider the samples coming from a random variable X. The metric returns x, relative to 100, such that the probability of X is less than or equal to x, i.e., Prob(X <= x) = p/100. The number p MUST be a non-negative JSON number in the range [0, 100] (i.e., greater than or equal to 0 and less than or equal to 100). To avoid complex identifiers, the number MUST NOT include the minus or the exp component (Section 6 of [RFC8259]). For example, delay-ow-p75 gives the 75% percentile of observed one-way delay; delay-ow-p99.9 gives the 99.9% percentile of delay. Note that some systems use quantile, which is in the range [0, 1]. This document uses percentile to make the identifier easier to read.

min:

the minimal value of the observations.

max:

the maximal value of the observations.

median:

the mid point (i.e., p50) of the observations.

mean:

the arithmetic mean value of the observations.

stddev:

the standard deviation of the observations.

stdvar:

the standard variance of the observations.

If a metric has no <stat> (i.e., <metric-identifier> does not include '-' <stat>), the metric MUST be considered as the 50 percentile (median).

3. Packet Performance Metrics

This section introduces ALTO network performance metrics on one way delay, round trip delay, delay variation, hop count, and packet loss rate. They measure the "quality of experience" of the stream of packets sent from a resource provider to a resource consumer. The measures of each individual packet (pkt) can include the delay from the time when the packet enters the network to the time when the packet leaves the network (pkt.delay); the number of network hops that the packet traverses (pkt.hopcount); and whether the packet is dropped before reaching the destination (pkt.dropped). The semantics of the performance metrics defined in this section are that they are statistics (percentiles) computed from these measures; for example, the x-percentile of the one-way delay is the x-percentile of the set of delays {pkt.delay} for the packets in the stream.

3.1. Cost Metric: One-Way Delay (delay-ow)

3.1.1. Base Identifier

The base identifier for this performance metric is "delay-ow".

3.1.2. Value Representation

The metric value type is a single 'JSONNumber' type value conforming to the number specification of [RFC8259] Section 6. The unit is expressed in milliseconds. Hence, the number can be a floating point number to express delay that is smaller than milliseconds. The number MUST be non-negative.

3.1.3. Intended Semantics and Use

Intended Semantics: To specify the spatial and temporal aggregated delay of a stream of packets from the specified source and the specified destination. The spatial aggregation level is specified in the query context, e.g., provider-defined identifier (PID) to PID, or endpoint to endpoint.

Use: This metric could be used as a cost metric constraint attribute or as a returned cost metric in the response.

Example 1: Delay value on source-destination endpoint pairs

POST /endpointcost/lookup HTTP/1.1
Host: alto.example.com
Content-Length: TBA
Content-Type: application/alto-endpointcostparams+json
Accept:
  application/alto-endpointcost+json,application/alto-error+json

{
  "cost-type": {"cost-mode" : "numerical",
                "cost-metric" : "delay-ow"},
  "endpoints" : {
    "srcs": [ "ipv4:192.0.2.2" ],
    "dsts": [
      "ipv4:192.0.2.89",
      "ipv4:198.51.100.34",
      "ipv6:2001:db8::1234:5678"
    ]
  }
}
HTTP/1.1 200 OK
Content-Length: TBA
Content-Type: application/alto-endpointcost+json
{
  "meta" :{
    "cost-type": {"cost-mode" : "numerical",
                  "cost-metric" : "delay-ow"
     }
   },
    "endpoint-cost-map" : {
      "ipv4:192.0.2.2": {
        "ipv4:192.0.2.89"    : 10,
        "ipv4:198.51.100.34" : 20,
        "ipv6:2001:db8::1234:5678"  : 30,
    }
  }
}

Comment: Since the "cost-type" does not include the "cost-source" field, the values are based on "estimation". Since the identifier does not include the -<percentile> component, the values will represent median values.

3.1.4. Cost-Context Specification Considerations

"nominal": Typically network one-way delay does not have a nominal value.

"sla": Many networks provide delay in their application-level service level agreements. It is RECOMMENDED that the "parameters" field of an "sla" one-way delay metric includes a link (i.e., a field named "link") providing an URI to the specification of SLA details, if available. This specification can be either free text for possible presentation to the user, or a formal specification. The format of the specification is out of the scope of this document.

"estimation": The exact estimation method is out of the scope of this document. There can be multiple sources to estimate one-way delay. For example, the server may use [RFC8571] (by using unidirectional link delay, min/max unidirectional link delay) to estimate the path delay. During estimation, the server should be cognizant of potential issues when computing an end-to-end summary statistic from link statistics. Another example of a source to estimate the delay is the IPPM framework [RFC2330]. It is RECOMMENDED that the "parameters" field of an "estimation" one-way delay metric includes a link (a field named "link") providing an URI to a description of the "estimation" method. This description can be either free text for possible presentation to the user, or a formal specification; see [IANA-IPPM] for the specification on fields which should be included. The format of the description is out of the scope of this document.

3.2. Cost Metric: Round-trip Delay (delay-rt)

3.2.1. Base Identifier

The base identifier for this performance metric is "delay-rt".

3.2.2. Value Representation

The metric value type is a single 'JSONNumber' type value conforming to the number specification of [RFC8259] Section 6. The number MUST be non-negative. The unit is expressed in milliseconds.

3.2.3. Intended Semantics and Use

Intended Semantics: To specify spatial and temporal aggregated round-trip delay between the specified source and specified destination. The spatial aggregation level is specified in the query context (e.g., PID to PID, or endpoint to endpoint).

Note that it is possible for a client to query two one-way delays and then compute the round-trip delay. The server should be cognizant of the consistency of values.

Use: This metric could be used either as a cost metric constraint attribute or as a returned cost metric in the response.

Example 2: Round-trip Delay of source-destination endpoint pairs

POST /endpointcost/lookup HTTP/1.1
Host: alto.example.com
Content-Length: TBA
Content-Type: application/alto-endpointcostparams+json
Accept:
  application/alto-endpointcost+json,application/alto-error+json

{
 "cost-type": {"cost-mode" : "numerical",
               "cost-metric" : "delay-rt"},
  "endpoints" : {
     "srcs": [ "ipv4:192.0.2.2" ],
     "dsts": [
       "ipv4:192.0.2.89",
       "ipv4:198.51.100.34",
       "ipv6:2001:db8::1234:5678"
     ]
   }
}
 HTTP/1.1 200 OK
 Content-Length: TBA
 Content-Type: application/alto-endpointcost+json
 {
   "meta" :{
     "cost-type": {"cost-mode" : "numerical",
                   "cost-metric" : "delay-rt"
      }
    },
     "endpoint-cost-map" : {
       "ipv4:192.0.2.2": {
         "ipv4:192.0.2.89"    : 4,
         "ipv4:198.51.100.34" : 3,
         "ipv6:2001:db8::1234:5678" : 2,
     }
   }
 }

3.2.4. Cost-Context Specification Considerations

"nominal": Typically network round-trip delay does not have a nominal value.

"sla": It is RECOMMENDED that the "parameters" field of an "sla" round-trip delay metric includes a link (a field named "link") providing an URI to the specification of SLA details, if available. This specification can be either free text for possible presentation to the user, or a formal specification. The format of the specification is out of the scope of this document.

"estimation": The exact estimation method is out of the scope of this document. It is RECOMMENDED that the "parameters" field of an "estimation" round-trip delay metric includes a link (a field named "link") providing an URI to a description of the "estimation" method; see Section 3.1.4 for related discussions on the link.

3.3. Cost Metric: Delay Variation (delay-variation)

3.3.1. Base Identifier

The base identifier for this performance metric is "delay-variation".

3.3.2. Value Representation

The metric value type is a single 'JSONNumber' type value conforming to the number specification of [RFC8259] Section 6. The number MUST be non-negative. The unit is expressed in milliseconds.

3.3.3. Intended Semantics and Use

Intended Semantics: To specify spatial and temporal aggregated delay variation (also called delay jitter)) with respect to the minimum delay observed on the stream over the specified source and destination. The spatial aggregation level is specified in the query context (e.g., PID to PID, or endpoint to endpoint).

Note that in statistics, variations are typically evaluated by the distance from samples relative to the mean. In networking context, it is more commonly defined from samples relative to the min. This definition follows the networking convention.

Use: This metric could be used either as a cost metric constraint attribute or as a returned cost metric in the response.

Example 3: Delay variation value on source-destination endpoint pairs

POST /endpointcost/lookup HTTP/1.1
Host: alto.example.com
Content-Length: TBA
Content-Type: application/alto-endpointcostparams+json
Accept:
   application/alto-endpointcost+json,application/alto-error+json

{
  "cost-type": {"cost-mode" : "numerical",
   "cost-metric" : "delay-variation"},
  "endpoints" : {
    "srcs": [ "ipv4:192.0.2.2" ],
    "dsts": [
      "ipv4:192.0.2.89",
      "ipv4:198.51.100.34",
      "ipv6:2001:db8::1234:5678"
    ]
  }
}
HTTP/1.1 200 OK
 Content-Length: TBA
 Content-Type: application/alto-endpointcost+json
{
  "meta": {
           "cost type": {
           "cost-mode": "numerical",
           "cost-metric":"delay-variation"
    }
   },
  "endpoint-cost-map": {
           "ipv4:192.0.2.2": {
           "ipv4:192.0.2.89"    : 0
           "ipv4:198.51.100.34" : 1
           "ipv6:2001:db8::1234:5678" : 5
         }
      }
   }

3.3.4. Cost-Context Specification Considerations

"nominal": Typically network delay variation does not have a nominal value.

"sla": It is RECOMMENDED that the "parameters" field of an "sla" delay variation metric includes a link (a field named "link") providing an URI to the specification of SLA details, if available. This specification can be either free text for possible presentation to the user, or a formal specification. The format of the specification is out of the scope of this document.

"estimation": The exact estimation method is out of the scope of this document. It is RECOMMENDED that the "parameters" field of an "estimation" delay variation metric provides a link ("link") to a description of the "estimation" method. See Section 3.1.4 for related discussions.

3.4. Cost Metric: Hop Count (hopcount)

The hopcount metric is mentioned in [RFC7285] Section 9.2.3 as an example. This section further clarifies its properties.

3.4.1. Base Identifier

The base identifier for this performance metric is "hopcount".

3.4.2. Value Representation

The metric value type is a single 'JSONNumber' type value conforming to the number specification of [RFC8259] Section 6. The number MUST be a non-negative integer (greater than or equal to 0). The value represents the number of hops.

3.4.3. Intended Semantics and Use

Intended Semantics: To specify the number of hops in the path from the specified source to the specified destination. The hop count is a basic measurement of distance in a network and can be exposed as the number of router hops computed from the routing protocols originating this information. The spatial aggregation level is specified in the query context (e.g., PID to PID, or endpoint to endpoint).

Use: This metric could be used as a cost metric constraint attribute or as a returned cost metric in the response.

Example 4: hopcount value on source-destination endpoint pairs

POST /endpointcost/lookup HTTP/1.1
Host: alto.example.com
Content-Length: TBA
Content-Type: application/alto-endpointcostparams+json
Accept:
  application/alto-endpointcost+json,application/alto-error+json

  {
    "cost-type": {"cost-mode" : "numerical",
     "cost-metric" : "hopcount"},
    "endpoints" : {
      "srcs": [ "ipv4:192.0.2.2" ],
      "dsts": [
        "ipv4:192.0.2.89",
        "ipv4:198.51.100.34",
        "ipv6:2001:db8::1234:5678"
      ]
    }
  }
HTTP/1.1 200 OK
Content-Length: TBA
Content-Type: application/alto-endpointcost+json
{
    "meta": {
               "cost type": {
             "cost-mode": "numerical",
             "cost-metric":"hopcount"}
       }
    },
   "endpoint-cost-map": {
           "ipv4:192.0.2.2": {
           "ipv4:192.0.2.89"   : 5,
           "ipv4:198.51.100.34": 3,
           "ipv6:2001:db8::1234:5678" : 2,
          }
    }
 }

3.4.4. Cost-Context Specification Considerations

"nominal": Typically hop count does not have a nominal value.

"sla": Typically hop count does not have an SLA value.

"estimation": The exact estimation method is out of the scope of this document. An example of estimating hopcounts is by importing from IGP routing protocols. It is RECOMMENDED that the "parameters" field of an "estimation" hop count metric provides a link ("link") to a description of the "estimation" method.

3.5. Cost Metric: Loss Rate (lossrate)

3.5.1. Base Identifier

The base identifier for this performance metric is "lossrate".

3.5.2. Value Representation

The metric value type is a single 'JSONNumber' type value conforming to the number specification of [RFC8259] Section 6. The number MUST be non-negative. The value represents the percentage of packet losses.

3.5.3. Intended Semantics and Use

Intended Semantics: To specify spatial and temporal aggregated packet loss rate from the specified source and the specified destination. The spatial aggregation level is specified in the query context (e.g., PID to PID, or endpoint to endpoint).

Use: This metric could be used as a cost metric constraint attribute or as a returned cost metric in the response.

Example 5: Loss rate value on source-destination endpoint pairs

POST /endpointcost/lookup HTTP/1.1
Host: alto.example.com
Content-Length: TBA
Content-Type: application/alto-endpointcostparams+json
Accept:
  application/alto-endpointcost+json,application/alto-error+json

  {
    "cost-type": {"cost-mode" : "numerical",
                  "cost-metric" : "lossrate"
    },
    "endpoints" : {
      "srcs": [ "ipv4:192.0.2.2" ],
      "dsts": [
        "ipv4:192.0.2.89",
        "ipv4:198.51.100.34",
        "ipv6:2001:db8::1234:5678"
      ]
    }
  }
HTTP/1.1 200 OK
Content-Length: TBA
Content-Type: application/alto-endpointcost+json
{
    "meta": {
      "cost-type": {
        "cost-mode": "numerical",
        "cost-metric":"lossrate"
      }
    },
   "endpoint-cost-map": {
      "ipv4:192.0.2.2": {
        "ipv4:192.0.2.89"   : 0,
        "ipv4:198.51.100.34": 0,
        "ipv6:2001:db8::1234:5678" : 0,
      }
    }
 }

3.5.4. Cost-Context Specification Considerations

"nominal": Typically packet loss rate does not have a nominal value, although some networks may specify zero losses.

"sla": It is RECOMMENDED that the "parameters" field of an "sla" packet loss rate includes a link (a field named "link") providing an URI to the specification of SLA details, if available. This specification can be either free text for possible presentation to the user, or a formal specification. The format of the specification is out of the scope of this document.

"estimation": The exact estimation method is out of the scope of this document. It is RECOMMENDED that the "parameters" field of an "estimation" packet loss rate metric provides a link ("link") to a description of the "estimation" method. See Section 3.1.4 on on related discussions such as summing up link metrics to obtain end-to-end metrics.

4. Bandwidth Performance Metrics

This section introduces three bandwidth related metrics. Given a specified source to a specified destination, these metrics reflect the volume of traffic that the network can carry from the source to the destination.

4.1. Cost Metric: TCP Throughput (tput)

4.1.1. Base Identifier

The base identifier for this performance metric is "tput".

4.1.2. Value Representation

The metric value type is a single 'JSONNumber' type value conforming to the number specification of [RFC8259] Section 6. The number MUST be non-negative. The unit is bytes per second.

4.1.3. Intended Semantics and Use

Intended Semantics: To give the throughput of a TCP congestion-control conforming flow from the specified source to the specified destination; see [RFC3649, Sec. 5.1 of RFC8312] on how TCP throughput is estimated. The spatial aggregation level is specified in the query context (e.g., PID to PID, or endpoint to endpoint).

Use: This metric could be used as a cost metric constraint attribute or as a returned cost metric in the response.

Example 5: TCP throughput value on source-destination endpoint pairs

POST /endpointcost/lookup HTTP/1.1
Host: alto.example.com
Content-Length: TBA
Content-Type: application/alto-endpointcostparams+json
Accept:
  application/alto-endpointcost+json,application/alto-error+json

{
  "cost-type": {"cost-mode" : "numerical",
                "cost-metric" : "tput"},
  "endpoints" : {
    "srcs": [ "ipv4:192.0.2.2" ],
    "dsts": [
       "ipv4:192.0.2.89",
       "ipv4:198.51.100.34",
       "ipv6:2001:db8::1234:5678"
    ]
  }
}
HTTP/1.1 200 OK
Content-Length: TBA
Content-Type: application/alto-endpointcost+json
{
  "meta": {
     "cost type": {
        "cost-mode": "numerical",
        "cost-metric":"tput"
    }
  }
  "endpoint-cost-map": {
    "ipv4:192.0.2.2": {
      "ipv4:192.0.2.89"   : 256000,
      "ipv4:198.51.100.34": 128000,
      "ipv6:2001:db8::1234:5678" : 428000,
  }
}

4.1.4. Cost-Context Specification Considerations

"nominal": Typically TCP throughput does not have a nominal value.

"sla": Typically TCP throughput does not have an SLA value.

"estimation": The exact estimation method is out of the scope of this document. See [Prophet] for a method to estimate TCP throughput. It is RECOMMENDED that the "parameters" field of an "estimation" TCP throughput metric provides a link (a field named "link") to a description of the "estimation" method. Note that as TCP congestion control algorithms evolve (e.g., TCP Cubic Congestion Control [RFC8312]), it helps to specify as much details as possible on the the congestion control algorithm used. This description can be either free text for possible presentation to the user, or a formal specification. The semantics are out of the scope of this document.

4.2. Cost Metric: Residual Bandwidth (bw-residual)

4.2.1. Base Identifier

The base identifier for this performance metric is "bw-residual".

4.2.2. Value Representation

The metric value type is a single 'JSONNumber' type value that is non-negative. The unit of measurement is bytes per second.

4.2.3. Intended Semantics and Use

Intended Semantics: To specify spatial and temporal residual bandwidth from the specified source and the specified destination. The value is calculated by subtracting tunnel reservations from Maximum Bandwidth (motivated from [RFC8570], Section 4.5). The spatial aggregation unit is specified in the query context (e.g., PID to PID, or endpoint to endpoint).

Use: This metric could be used either as a cost metric constraint attribute or as a returned cost metric in the response.

Example 7: bw-residual value on source-destination endpoint pairs

POST/ endpointcost/lookup HTTP/1.1
Host: alto.example.com
Content-Length: TBA
Content-Type: application/alto-endpointcostparams+json
Accept:
  application/alto-endpointcost+json,application/alto-error+json

  {
   "cost-type": { "cost-mode":   "numerical",
                  "cost-metric": "bw-residual"},
   "endpoints":  {
     "srcs": [ "ipv4 : 192.0.2.2" ],
     "dsts": [
       "ipv4:192.0.2.89",
       "ipv4:198.51.100.34",
       "ipv6:2001:db8::1234:5678"
     ]
   }
  }
HTTP/1.1 200 OK
Content-Length: TBA
Content-Type: application/alto-endpointcost+json
{
  "meta": {
    "cost-type" {
      "cost-mode": "numerical",
      "cost-metric": "bw-residual"
    }
  },
  "endpoint-cost-map" {
    "ipv4:192.0.2.2" {
      "ipv4:192.0.2.89" :    0,
      "ipv4:198.51.100.34": 2000,
      "ipv6:2001:db8::1234:5678" : 5000,
    }
  }
}

4.2.4. Cost-Context Specification Considerations

"nominal": Typically residual bandwidth does not have a nominal value.

"sla": Typically residual bandwidth does not have an "sla" value.

"estimation": The exact estimation method is out of the scope of this document. It is RECOMMENDED that the "parameters" field of an "estimation" residual bandwidth metric provides a link ("link") to a description of the "estimation" method. See Section 3.1.4 on related discussions. The server should be cognizant of issues when computing end-to-end summary statistics from link statistics. For example, the min of the end-to-end path residual bandwidth is the min of all links on the path.

4.3. Cost Metric: Maximum Reservable Bandwidth (bw-maxres)

4.3.1. Base Identifier

The base identifier for this performance metric is "bw-maxres".

4.3.2. Value Representation

The metric value type is a single 'JSONNumber' type value that is non-negative. The unit of measurement is bytes per second.

4.3.3. Intended Semantics and Use

Intended Semantics: To specify spatial and temporal maximum reservable bandwidth from the specified source to the specified destination. The value corresponds to the maximum bandwidth that can be reserved (motivated from [RFC3630] Section 2.5.7). The spatial aggregation unit is specified in the query context (e.g., PID to PID, or endpoint to endpoint).

Use: This metric could be used either as a cost metric constraint attribute or as a returned cost metric in the response.

  Example 6: bw-maxres value on source-destination endpoint pairs

POST/ endpointcost/lookup HTTP/1.1
Host: alto.example.com
Content-Length: TBA
Content-Type: application/alto-endpointcostparams+json
Accept:
  application/alto-endpointcost+json,application/alto-error+json

  {
    "cost-type" { "cost-mode":   "numerical",
                  "cost-metric": "bw-maxres"},
    "endpoints":  {
      "srcs": [ "ipv4 : 192.0.2.2" ],
      "dsts": [
        "ipv4:192.0.2.89",
        "ipv4:198.51.100.34",
        "ipv6:2001:db8::1234:5678"
      ]
    }
  }
HTTP/1.1 200 OK
Content-Length: TBA
Content-Type: application/alto-endpointcost+json
{
  "meta": {
    "cost-type": {
      "cost-mode":   "numerical",
      "cost-metric": "bw-maxres"
    }
  },
  "endpoint-cost-map": {
    "ipv4:192.0.2.2" {
      "ipv4:192.0.2.89" :    0,
      "ipv4:198.51.100.34": 2000,
      "ipv6:2001:db8::1234:5678" :  5000,
    }
  }
}

4.3.4. Cost-Context Specification Considerations

"nominal": Typically maximum reservable bandwidth does not have a nominal value.

"sla": Typically maximum reservable bandwidth does not have an "sla" value.

"estimation": The exact estimation method is out of the scope of this document. There can be multiple sources to estimate maximum reservable bandwidth. For example, Maximum reservable bandwidth is defined by IS-IS/OSPF TE, and measures the reservable bandwidth between two directly connected IS-IS neighbors or OSPF neighbors; see Section 3.5 of [RFC5305]. An estimtation can also be computed from [RFC8571] (by using unidirectional maximum reservable bandwidth). It is RECOMMENDED that the "parameters" field of an "estimation" maximum reservable bandwidth metric provides a link ("link") to a description of the "estimation" method. This description can be either free text for possible presentation to the user, or a formal specification. The semantics are out of the scope of this document.

5. Operational Considerations

The exact measurement infrastructure, measurement condition, and computation algorithms can vary from different networks, and are outside the scope of this document. Both the ALTO server and the ALTO clients, however, need to be cognizant of the operational issues discussed below.

Also, the performance metrics specified in this document are similar, in that they may use similar data sources and have similar issues in their calculation. Hence, we specify common issues unless one metric has its unique challenges.

5.1. Source Considerations

The addition of the "cost-source" field is to solve a key issue: An ALTO server needs data sources to compute the cost metrics described in this document, and an ALTO client needs to know the data sources to better interpret the values.

To avoid too fine-grained information, this document introduces "cost-source" to indicate only the high-level type of data sources: "estimation" or "sla", where "estimation" is a type of measurement data source, and "sla" is a type that is more based on policy.

For estimation, for example, the ALTO server may use log servers or the OAM system as its data source as recommended by [RFC7971]. In particular, the cost metrics defined in this document can be computed using routing systems as the data sources.

5.2. Metric Timestamp Consideration

Despite the introduction of the additional cost-context information, the metrics do not have a field to indicate the timestamps of the data used to compute the metrics. To indicate this attribute, the ALTO server SHOULD return HTTP "Last-Modified", to indicate the freshness of the data used to compute the performance metrics.

If the ALTO client obtains updates through an incremental update mechanism [RFC8895]), the client SHOULD assume that the metric is computed using a snapshot at the time that is approximated by the receiving time.

5.3. Backward Compatibility Considerations

One potential issue introduced by the optional "cost-source" field is backward compatibility. Consider that an IRD which defines two cost-types with the same "cost-mode" and "cost-metric", but one with "cost-source" being "estimation" and the other being "sla". Then an ALTO client that is not aware of the extension will not be able to distinguish between these two types. A similar issue can arise even with a single cost-type, whose "cost-source" is "sla": an ALTO client that is not aware of this extension will ignore this field and consider the metric estimation.

To address the backward-compatibility issue, if a "cost-metric" is "routingcost" and the metric contains a "cost-context" field, then it MUST be "estimation"; if it is not, the client SHOULD reject the information as invalid.

5.4. Computation Considerations

The metric values exposed by an ALTO server may result from additional processing on measurements from data sources to compute exposed metrics. This may involve data processing tasks such as aggregating the results across multiple systems, removing outliers, and creating additional statistics. There are two challenges on the computation of ALTO performance metrics.

5.4.1. Configuration Parameters Considerations

Performance metrics often depend on configuration parameters, and exposing such configuration parameters can help an ALTO client to better understand the exposed metrics. In particular, an ALTO server may be configured to compute a TE metric (e.g., packet loss rate) in fixed intervals, say every T seconds. To expose this information, the ALTO server may provide the client with two pieces of additional information: (1) when the metrics are last computed, and (2) when the metrics will be updated (i.e., the validity period of the exposed metric values). The ALTO server can expose these two pieces of information by using the HTTP response headers Last-Modified and Expires.

5.4.2. Aggregation Computation Considerations

An ALTO server may not be able to measure the performance metrics to be exposed. The basic issue is that the "source" information can often be link level. For example, routing protocols often measure and report only per link loss, not end-to-end loss; similarly, routing protocols report link level available bandwidth, not end-to-end available bandwidth. The ALTO server then needs to aggregate these data to provide an abstract and unified view that can be more useful to applications. The server should consider that different metrics may use different aggregation computation. For example, the end-to-end latency of a path is the sum of the latency of the links on the path; the end-to-end available bandwidth of a path is the minimum of the available bandwidth of the links on the path.

6. Security Considerations

The properties defined in this document present no security considerations beyond those in Section 15 of the base ALTO specification [RFC7285].

However, concerns addressed in Sections "15.1 Authenticity and Integrity of ALTO Information", "15.2 Potential Undesirable Guidance from Authenticated ALTO Information", and "15.3 Confidentiality of ALTO Information" remain of utmost importance. Indeed, TE performance is highly sensitive ISP information; therefore, sharing TE metric values in numerical mode requires full mutual confidence between the entities managing the ALTO server and the ALTO client. ALTO servers will most likely distribute numerical TE performance to ALTO clients under strict and formal mutual trust agreements. On the other hand, ALTO clients must be cognizant on the risks attached to such information that they would have acquired outside formal conditions of mutual trust.

To mitigate confidentiality risks during information transport of TE performance metrics, the operator should address the risk of ALTO information being leaked to malicious Clients or third parties, through attacks such as the man-in-the-middle (MITM) attacks. As specified in "Protection Strategies" (Section 15.3.2 of [RFC7285]), the ALTO Server should authenticate ALTO Clients when transmitting an ALTO information resource containing sensitive TE performance metrics. "Authentication and Encryption" (Section 8.3.5 of [RFC7285]) specifies that "ALTO Server implementations as well as ALTO Client implementations MUST support the "https" URI scheme of [RFC2818] and Transport Layer Security (TLS) of [RFC5246]".

7. IANA Considerations

IANA has created and now maintains the "ALTO Cost Metric Registry", listed in Section 14.2, Table 3 of [RFC7285]. This registry is located at <http://www.iana.org/assignments/alto-protocol/alto-protocol.xhtml#cost-metrics>. This document requests to add the following entries to "ALTO Cost Metric Registry".

+-----------------+--------------------+
| Identifier      | Intended Semantics |
+-----------------+--------------------+
| delay-ow        | See Section 3.1    |
| delay-rt        | See Section 3.2    |
| delay-variation | See Section 3.3    |
| hopcount        | See Section 3.4    |
| lossrate        | See Section 3.5    |
| tput            | See Section 4.1    |
| bw-residual     | See Section 4.2    |
| bw-maxres       | See Section 4.3    |
+-----------------+--------------------+

This document requests the creation of the "ALTO Cost Source Registry" with the following currently defined values:

+------------+-----------------------------+
| Identifier | Intended Semantics          |
+------------+-----------------------------+
| nominal    | Values in nominal cases     |
| sla        | Values reflecting service   |
|            | level agreement             |
| estimation | Values by estimation        |
+------------+-----------------------------+

8. Acknowledgments

The authors of this document would also like to thank Martin Duke for the highly informative, thorough AD reviews and comments. We thank Brian Trammell, Haizhou Du, Kai Gao, Lili Liu, Geng Li, Danny Alex Lachos Perez for the reviews and comments.

9. References

9.1. Normative References

[IANA-IPPM]
IANA, "Performance Metrics Registry, https://www.iana.org/assignments/performance-metrics/performance-metrics.xhtml".
[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/info/rfc2119>.
[RFC2330]
Paxson, V., Almes, G., Mahdavi, J., and M. Mathis, "Framework for IP Performance Metrics", RFC 2330, DOI 10.17487/RFC2330, , <https://www.rfc-editor.org/info/rfc2330>.
[RFC6390]
Clark, A. and B. Claise, "Guidelines for Considering New Performance Metric Development", BCP 170, RFC 6390, DOI 10.17487/RFC6390, , <https://www.rfc-editor.org/info/rfc6390>.
[RFC7285]
Alimi, R., Ed., Penno, R., Ed., Yang, Y., Ed., Kiesel, S., Previdi, S., Roome, W., Shalunov, S., and R. Woundy, "Application-Layer Traffic Optimization (ALTO) Protocol", RFC 7285, DOI 10.17487/RFC7285, , <https://www.rfc-editor.org/info/rfc7285>.
[RFC8174]
Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, , <https://www.rfc-editor.org/info/rfc8174>.
[RFC8259]
Bray, T., Ed., "The JavaScript Object Notation (JSON) Data Interchange Format", STD 90, RFC 8259, DOI 10.17487/RFC8259, , <https://www.rfc-editor.org/info/rfc8259>.
[RFC8895]
Roome, W. and Y. Yang, "Application-Layer Traffic Optimization (ALTO) Incremental Updates Using Server-Sent Events (SSE)", RFC 8895, DOI 10.17487/RFC8895, , <https://www.rfc-editor.org/info/rfc8895>.

9.2. Informative References

[Prometheus]
Volz, J. and B. Rabenstein, "Prometheus: A Next-Generation Monitoring System", .
[Prophet]
Gao, K., Zhang, J., and YR. Yang, "Prophet: Fast, Accurate Throughput Prediction with Reactive Flows", ACM/IEEE Transactions on Networking July, .
[RFC2681]
Almes, G., Kalidindi, S., and M. Zekauskas, "A Round-trip Delay Metric for IPPM", RFC 2681, DOI 10.17487/RFC2681, , <https://www.rfc-editor.org/info/rfc2681>.
[RFC3393]
Demichelis, C. and P. Chimento, "IP Packet Delay Variation Metric for IP Performance Metrics (IPPM)", RFC 3393, DOI 10.17487/RFC3393, , <https://www.rfc-editor.org/info/rfc3393>.
[RFC3630]
Katz, D., Kompella, K., and D. Yeung, "Traffic Engineering (TE) Extensions to OSPF Version 2", RFC 3630, DOI 10.17487/RFC3630, , <https://www.rfc-editor.org/info/rfc3630>.
[RFC5305]
Li, T. and H. Smit, "IS-IS Extensions for Traffic Engineering", RFC 5305, DOI 10.17487/RFC5305, , <https://www.rfc-editor.org/info/rfc5305>.
[RFC5357]
Hedayat, K., Krzanowski, R., Morton, A., Yum, K., and J. Babiarz, "A Two-Way Active Measurement Protocol (TWAMP)", RFC 5357, DOI 10.17487/RFC5357, , <https://www.rfc-editor.org/info/rfc5357>.
[RFC6349]
Constantine, B., Forget, G., Geib, R., and R. Schrage, "Framework for TCP Throughput Testing", RFC 6349, DOI 10.17487/RFC6349, , <https://www.rfc-editor.org/info/rfc6349>.
[RFC7679]
Almes, G., Kalidindi, S., Zekauskas, M., and A. Morton, Ed., "A One-Way Delay Metric for IP Performance Metrics (IPPM)", STD 81, RFC 7679, DOI 10.17487/RFC7679, , <https://www.rfc-editor.org/info/rfc7679>.
[RFC7680]
Almes, G., Kalidindi, S., Zekauskas, M., and A. Morton, Ed., "A One-Way Loss Metric for IP Performance Metrics (IPPM)", STD 82, RFC 7680, DOI 10.17487/RFC7680, , <https://www.rfc-editor.org/info/rfc7680>.
[RFC7971]
Stiemerling, M., Kiesel, S., Scharf, M., Seidel, H., and S. Previdi, "Application-Layer Traffic Optimization (ALTO) Deployment Considerations", RFC 7971, DOI 10.17487/RFC7971, , <https://www.rfc-editor.org/info/rfc7971>.
[RFC8570]
Ginsberg, L., Ed., Previdi, S., Ed., Giacalone, S., Ward, D., Drake, J., and Q. Wu, "IS-IS Traffic Engineering (TE) Metric Extensions", RFC 8570, DOI 10.17487/RFC8570, , <https://www.rfc-editor.org/info/rfc8570>.
[RFC8571]
Ginsberg, L., Ed., Previdi, S., Wu, Q., Tantsura, J., and C. Filsfils, "BGP - Link State (BGP-LS) Advertisement of IGP Traffic Engineering Performance Metric Extensions", RFC 8571, DOI 10.17487/RFC8571, , <https://www.rfc-editor.org/info/rfc8571>.

Authors' Addresses

Qin Wu
Huawei
101 Software Avenue, Yuhua District
Nanjing
Jiangsu, 210012
China
Y. Richard Yang
Yale University
51 Prospect St
New Haven, CT 06520
United States of America
Young Lee
Samsung
1700 Alma Drive, Suite 500
Plano, TX 75075
United States of America
Dhruv Dhody
Huawei
Leela Palace
Bangalore 560008
Karnataka
India
Sabine Randriamasy
Nokia Bell Labs
Route de Villejust
91460 Nozay
France
Luis Miguel Contreras Murillo
Telefonica
Madrid
Spain