Internet-Draft Computing-Aware Traffic Steering (CATS) June 2023
Yao, et al. Expires 21 December 2023 [Page]
Workgroup:
cats
Internet-Draft:
draft-yao-cats-ps-usecases-01
Published:
Intended Status:
Informational
Expires:
Authors:
K. Yao
China Mobile
D. Trossen
Huawei Technologies
M. Boucadair
Orange
LM. Contreras
Telefonica
H. Shi
Huawei Technologies
Y. Li
Huawei Technologies
S. Zhang
China Unicom

Computing-Aware Traffic Steering (CATS) Problem Statement, Use Cases and Requirements

Abstract

Many service providers have been exploring distributed computing techniques to achieve better service response time and optimized energy consumption. Such techniques rely upon the distribution of computing services and capabilities over many locations in the network, such as its edge, the metro region, virtualized central office, and other locations. In such a distributed computing environment, providing services by utilizing computing resources hosted in various computing facilities (e.g., edges) is being considered, e.g., for computationally intensive and delay sensitive services. Ideally, services SHOULD be computationally balanced using service-specific metrics instead of simply dispatching the service requests in a static way or optimizing solely connectivity metrics. For example, systematically directing end user-originated service requests to the geographically closest edge or some small computing units may lead to an unbalanced usage of computing resources, which may then degrade both the user experience and the overall service performance. We have named this kind of network with dynamic sharing of edge compute resources "Computing-Aware Traffic Steering" (CATS).

This document provides the problem statement and the typical scenarios of CATS, which is to show the necessity of considering more factors when steering the traffic to the appropriate service instance based on the basic edge computing deployment to provide the service equivalency.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 21 December 2023.

Table of Contents

1. Introduction

Network and Computing convergence has been evolving in the Internet for considerable time. With Content Delivery Networks (CDNs) 'frontloading' access to many services, over-the-top service provisioning has become a driving force for many services, such as video, storage and many others. In addition, network operators have extended their capabilities by complementing their network infrastructure by developing CDN capabilities, particularly in edge sites. Compared to a CDN-based content cache capability, more diverse computing resource need to be provided for general edge computing in an on-demand manner.

The reason of the fast development of this converged network/compute infrastructure is user demand. On the one hand, users want the best experience, e.g., expressed in low latency and high reliability, for new emerging applications such as high-definition video, AR and VR, live broadcast and so on. On the other hand, users want the stable experience when moving to different areas.

Generally, edge computing aims to provide better response times and transfer rates compared to Cloud Computing, by moving the computing towards the edge of a network. Edge computing can be built on embedded systems, gateways, and others, all being located close to end users' premises. There are millions of home gateways, thousands of base stations, and hundreds of central offices in a city that can serve as candidate edges for behaving as service nodes.

That brings about the key problem of deploying and scheduling traffic to the most suitable computing resource in order to meet the users' (service-specific) demand.

Depending on the location of an edge and its capacity, different computing resources can be contributed by each edge to deliver a service. At peak hours, computing resources attached to a client's closest edge may not be sufficient to handle all the incoming service requests. Longer response times or even dropping of requests can be experienced by users. Increasing the computing resources hosted on each edge to the potential maximum capacity is neither feasible nor economically viable in many cases. Offloading computation intensive processing to the User devices would give the huge pressure of battery, and the needed data set (for the computation) that may not exist on the user device because of the size of data pool or due to data governance reasons.

While service providers often have their own sites, which in turn have been upgraded to the edge sites, a specific service SHOULD be deployed in multiple edge sites to meet the users' demand. However, only the deployment itself might not enough to fully guarantee the quality of service. Instead, functional equivalency MUST be ensured by deploying instances for the same service across edge sites for better availability. Furthermore, load is to be kept balanced for both static and dynamic scenarios. For this, traffic needs to be dynamically steered to the "best" service instance. For this, traffic MUST be delivered to optimal edge sites according to information that may need to include, e.g., computing information, where the notion of 'best' may highly depend on the application demand.

A particular example is the popular and pervasive 5G MEC service. In 5G MEC, ULCL UPFs are deployed close to edge sites, which are capable of effectively classifying & switching uplink traffic to the suitable computing-resources that might be located either in local-area DNs, operators' DNs, or even 3rd-party's DNs. Through possibly using some 'intelligent' criteria, this could warrant the selection of resources with either low, high-computational power or all-involved requirements.

This document describes sample usage scenarios as well as key areas in which current solutions lead to problems that ultimately affect the deployment (including the performance) of edge services, and proposes the desired features of the CATS system. Those key areas target the identification of candidate solution components.

2. Definition of Terms

This document makes use of the following terms:

Service:
A monolithic functionality that is provided by an endpoint according to the specification for said service. A composite service can be built by orchestrating monolithic services.
Service instance:
Running environment (e.g., a node) that makes the functionality of a service available. One service can have several instances running at different network locations.
Service identifier:
Used to uniquely identify a service, at the same time identifying the whole set of service instances that each represent the same service behavior, no matter where those service instances are running.

3. Problem Statement

3.1. Multi-deployment of Edge Sites and Service

Since edge computing aims at a closer computing service based on the shorter network path, there will be more than one edge sites with the same application in the city/province/state, a number of representative cities have deployed multi-edge sites and the typical applications, and there are more edge sites to be deployed in the future. Before deploying edge sites, there are some factors need to be considered, such as:

o The exsiting infrastructure capacities, which could be used to update to edge sites, e.g. operators' machine room.

o The amount and frequency of computing resource that is needed.

o The network resource status linked to computing resource.

When the edge sites are deployed, to improve the effectiveness of service deployment, the problem of how to choose optimal edge node to deploy services needs to be solved. More stable static information SHOULD be considered in service deployment, [I-D.contreras-alto-service-edge] introduces the consideration of depoly applications or functions to the edge, such as the type of instance, compute flavor of CPU/GPU, etc, optional storage extension, optional hardware acceleration characteristics. Besides those, more network and service factors may be considered, such as:

o Network and computing resource topology: the overall consideration of network access, connectivity, path protection or redundancy. and the location and overall distribution of computing resources in network, and the relative position towards network topology.

o Location: the number of users brought, the differentiation of service types and number of connections requested by users, etc. For edge nodes located in popular area, which with large amount of users and service requests, the service duplication can be deployed more than other areas.

o Capacity of multiple edge nodes: not only a single node, but also the total number of requests that can be processed by the resource pool composed of multiple nodes

o Service category: For example, whether the business is multi-user interaction, such as video conferencing, games, or just resource acquisition, such as short video viewing Alto can help to obtain one or more of the above information, so as to provide suggestions or formulate principles and strategies for service deployment.

For the collection of those information, it could periodically collects the total consumption of computing resources, or the total number of sessions accessed, to notify where to deploy more VMS or containers. Unlike the scheduling of request, service deployment SHOULD still follow the principle of proximity. The more local access, the more resources SHOULD be deployed. If the resources are insufficient, the operator can be informed to increase the hardware resources.

3.2. Traffic Steering among Edges Sites and Service Instances

This section shows the necessity of traffic steering among different edges in the real city, considering the mobility of the people in different time slot, events, etc.

Traffic needs to be steered to the appropriate edge sites to ensure the application demands. Though the computing resource and network resource are considered when deploy the edge sites and service, but the reference resource information are more static, which can't meet the real-time or near real-time service request. That is, in some cases, the ‘closest’ is not the ‘best’, there will be the variable statues of computing and network could be summarized as:

o Closest site may not have enough resource, the load may dynamically change.

o Closest site may not have related resource, heterogeneous hardware in different sites.

Therefore, more enhancement based on edge computing is need. Because for edge computing, the service request always be steered to the closest edge site.

We assume that clients access one or more services with an objective to meet a desired user experience. Each participating service may be realized at one or more places in the network (called, service instances). Such service instances are instantiated and deployed as part of the overall service deployment process, e.g., using existing orchestration frameworks, within so-called edge sites, which in turn are reachable through a network infrastructure via an edge router.

When a client issues a service request to a required service, the request is being steered to one of the available service instances. Each service instance may act as a client towards another service, thereby seeing its own outbound traffic steered to a suitable service instance of the request service and so on, achieving service composition and chaining as a result.

The aforementioned selection of one of candidate service instances is done using traffic steering methods , where the steering decision may take into account pre-planned policies (assignment of certain clients to certain service instances), realize shortest-path to the 'closest' service instance, or utilize more complex and possibly dynamic metric information, such as load of service instances, latencies experienced or similar, for a more dynamic selection of a suitable service instance.

It is important to note that clients may move throughout the execution of a service, which may, as a result, position other service instance 'better' in terms of latency, load, or other metrics. This creates a (physical) dynamicity that will need to be catered for.

Figure 1 shows a common way to deploy edge sites in the metro. There is an edge data center for metro area which has high computing resource and provides the service to more UEs at the working time. Because more office buildings are in the Metro area. And there are also some remote edge sites which have limited computing resource and provide the service to the UEs closed to them.

The application such as the AR/VR, video recognition could be deployed in both the edge data center in metro area and the remote edge sites. In this case, the service request and the resource are matched well. Some potential traffic steering may needed just for special service request or some small scheduling demand.

     +----------------+    +---+                  +------------+
   +----------------+ |- - |UE1|                +------------+ |
   | +-----------+  | |    +---+             +--|    Edge    | |
   | |Edge server|  | |    +---+       +- - -|PE|            | |
   | +-----------+  | |- - |UE2|       |     +--|   Site 1   |-+
   | +-----------+  | |    +---+                +------------+
   | |Edge server|  | |     ...        |            |
   | +-----------+  | +--+         Potencial      +---+ +---+
   | +-----------+  | |PE|- - - - - - -+          |UEa| |UEb|
   | |Edge server|  | +--+         Steering       +---+ +---+
   | +-----------+  | |    +---+       |                  |
   | +-----------+  | |- - |UE3|                  +------------+
   | |  ... ...  |  | |    +---+       |        +------------+ |
   | +-----------+  | |     ...              +--|    Edge    | |
   |                | |    +---+       +- - -|PE|            | |
   |Edge data center|-+- - |UEn|             +--|   Site 2   |-+
   +----------------+      +---+                +------------+
   High computing resource              Limited computing resource
   and more UE at Metro area            and less UE at Remote area
Figure 1: Common Deployment of Edge Sites

Figure 2 shows that when it goes to non working time, for example at weekend or daily night, more UEs move to the remote area that are close to their house or for some weekend events. So there will be more service request at remote but with limited computing resource, while the rich computing resource might not be used with less UE in the Metro Area. It is possible for so many people request the AR/VR service at remote are but with the limited computing resource, moreover, as the people move from the metro area to the remote are, the edge sites served the common service such as intelligent transportation will also change, so it need to steer some traffic back to Metro center.

     +----------------+                           +------------+
   +----------------+ |                         +------------+ |
   | +-----------+  | |  Steering traffic    +--|    Edge    | |
   | |Edge server|  | |          +-----------|PE|            | |
   | +-----------+  | |    +---+ |           +--|   Site 1   |-+
   | +-----------+  | |- - |UEa| |    +----+----+-+----------+
   | |Edge server|  | |    +---+ |    |           |           |
   | +-----------+  | +--+       |  +---+ +---+ +---+ +---+ +---+
   | +-----------+  | |PE|-------+  |UE1| |UE2| |UE3| |...| |UEn|
   | |Edge server|  | +--+       |  +---+ +---+ +---+ +---+ +---+
   | +-----------+  | |    +---+ |          |           |
   | +-----------+  | |- - |UEb| |          +-----+-----+------+
   | |  ... ...  |  | |    +---+ |              +------------+ |
   | +-----------+  | |          |           +--|    Edge    | |
   |                | |          +-----------|PE|            | |
   |Edge data center|-+  Steering traffic    +--|   Site 2   |-+
   +----------------+                           +------------+
   High computing resource              Limited computing resource
   and less UE at Metro area            and more UE at Remote area
Figure 2: Steering Traffic among Edge Sites

There will also be the common variable of network and computing resources, for someone who is not moving but get a poor latency sometime. Because of other UEs’ moving, a large number of request for temporary events such as vocal concert, shopping festival and so on, and there will also be the normal change of the network and computing resource status. So for some fixed UEs, it is also expected to steer the traffic to appropriate sites dynamiclly.

Those problems indicate that traffic needs to be steered among different edge sites, because of the mobility of the UE and the common variable of network and computing resources. Moreover, some apps in the following Section require both low latency and high computing resource usage or specific computing HW capabilities (such as local GPU); hence joint optimization of network and computing resource is needed to guarantee the QoE.

4. Use Cases

This section presents a non-exhaustive list of scenarios which require multiple edge sites to interconnect and to coordinate at the network layer to meet the service demands and ensure better user experience.

4.1. Computing-Aware AR or VR

Cloud VR/AR services are used in some exhibitions, scenic spots, and celebration ceremonies. In the future, they might be used in more applications, such as industrial internet, medical industry, and meta verse.

Cloud VR/AR introduces the concept of cloud computing to the rendering of audiovisual assets in such applications. Here, the edge cloud helps encode/decode and render content. The end device usually only uploads posture or control information to the edge and then VR/AR contents are rendered in the edge cloud. The video and audio outputs generated from the edge cloud are encoded, compressed, and transmitted back to the end device or further transmitted to central data center via high bandwidth networks.

Edge sites may use CPU or GPU for encode/decode. GPU usually has better performance but CPU is simpler and more straightforward to use as well as possibly more widespread in deployment. Available remaining resources determines if a service instance can be started. The instance's CPU, GPU and memory utilization has a high impact on the processing delay on encoding, decoding and rendering. At the same time, the network path quality to the edge site is a key for user experience of quality of audio/ video and input command response times.

A Cloud VR service, such as a mobile gaming service, brings challenging requirements to both network and computing so that the edge node to serve a service request has to be carefully selected to make sure it has sufficient computing resource and good network path. For example, for an entry-level Cloud VR (panoramic 8K 2D video) with 110-degree Field of View (FOV) transmission, the typical network requirements are bandwidth 40Mbps, 20ms for motion-to-photon latency, packet loss rate is 2.4E-5; the typical computing requirements are 8K H.265 real-time decoding, 2K H.264 real-time encoding. We can further divide the 20ms latency budget into:

(i) sensor sampling delay(client), which is considered imperceptible by users is less than 1.5ms including an extra 0.5ms for digitalization and end device processing.

(ii) display refresh delay(client), which take 7.9ms based on the 144Hz display refreshing rate and 1ms extra delay to light up.

(iii) image/frame rendering delay(server), which could be reduced to 5.5ms.

(iv) network delay(network), which SHOULD be bounded to 20-1.5-5.5-7.9 = 5.1ms.

So the the budgets for server(computing) delay and network delay are almost equivalent, which make sense to consider both of the delay for computing and network. And it can’t meet the total delay requirements or find the best choice by either optimize the network or computing resource.

Based on the analysis, here are some further assumption as figure 3 shows, the client could request any service instance among 3 edge sites. The delay of client could be same, and the differences of differente edge sites and corresponding network path has different delays:

o Edge site 1: The computing delay=4ms based on a light load, and the corresponding network delay=9ms based on a heavy traffic.

o Edge site 2: The computing delay=10ms based on a heavy load, and the corresponding network delay=4ms based on a light traffic.

o Edge site 3: The edge site 3's computing delay=5ms based on a normal load, and the corresponding network delay=5ms based on a normal traffic.

In this case, we can't get a optimal network and computing total delay if choose the resource only based on either of computing or network status:

o If choosing the edge site based on the best computing delay it will be the edge site 1, the E2E delay=22.4ms.

o If choosing the edge site based on the best network delay it will be the edge site 2, the E2E delay=23.4ms.

o If choosing the edge site based on both of the status it will be the edge site 3, the E2E delay=19.4ms.

So, the best choice to ensure the E2E delay is edge site 3, which is 19.4ms and is less than 20ms. The differences of the E2E delay is only 3~4ms among the three, but some of them will meet the application demand while some doesn't.

The conclusion is that it requires to dynamically steer traffic to the appropriate edge to meet the E2E delay requirements considering both network and computing resource status. Moreover, the computing resources have a big difference in different edges, and the ‘closest site’ may be good for latency but lacks GPU support and SHOULD therefore not be chosen.

     Light Load          Heavy Load           Normal load
   +------------+      +------------+       +------------+
   |    Edge    |      |    Edge    |       |    Edge    |
   |   Site 1   |      |   Site 2   |       |   Site 3   |
   +-----+------+      +------+-----+       +------+-----+
computing|delay(4ms)          |           computing|delay(5ms)
         |           computing|delay(10ms)         |
    +----+-----+        +-----+----+         +-----+----+
    |  Egress  |        |  Egress  |         |  Egress  |
    | Router 1 |        | Router 2 |         | Router 3 |
    +----+-----+        +-----+----+         +-----+----+
  newtork|delay(9ms)   newtork|delay(4ms)   newtork|delay(5ms)
         |                    |                    |
         |           +--------+--------+           |
         +-----------|  Infrastructure |-----------+
                     +--------+--------+
                              |
                         +----+----+
                         | Ingress |
         +---------------|  Router |--------------+
         |               +----+----+              |
         |                    |                   |
      +--+--+              +--+---+           +---+--+
    +------+|            +------+ |         +------+ |
    |Client|+            |Client|-+         |Client|-+
    +------+             +------+           +------+
                   clien delay=1.5+7.9=9.4ms
Figure 3: Computing-Aware AR or VR

Furthermore, specific techniques may be employed to divide the overall rendering into base assets that are common across a number of clients participating in the service, while the client-specific input data is being utilized to render additional assets. When being delivered to the client, those two assets are being combined into the overall content being consumed by the client. The requirements for sending the client input data as well as the requests for the base assets may be different in terms of which service instances may serve the request, where base assets may be served from any nearby service instance (since those base assets may be served without requiring cross-request state being maintained), while the client-specific input data is being processed by a stateful service instance that changes, if at all, only slowly over time due to the stickiness of the service that is being created by the client-specific data. Other splits of rendering and input tasks can be found in[TR22.874] for further reading.

When it comes to the service instances themselves, those may be instantiated on-demand, e.g., driven by network or client demand metrics, while resources may also be released, e.g., after an idle timeout, to free up resources for other services. Depending on the utilized node technologies, the lifetime of such "function as a service" may range from many minutes down to millisecond scale. Therefore computing resources across participating edges exhibit a distributed (in terms of locations) as well as dynamic (in terms of resource availability) nature. In order to achieve a satisfying service quality to end users, a service request will need to be sent to and served by an edge with sufficient computing resource and a good network path.

4.2. Computing-Aware Intelligent Transportation

For the convenience of transportation, more video capture devices are required to be deployed as urban infrastructure, and the better video quality is also required to facilitate the content analysis. Therefore, the transmission capacity of the network will need to be further increased, and the collected video data need to be further processed, such as for pedestrian face recognition, vehicle moving track recognition, and prediction. This, in turn, also impacts the requirements for the video processing capacity of computing nodes.

In auxiliary driving scenarios, to help overcome the non-line-of- sight problem due to blind spot or obstacles, the edge node can collect comprehensive road and traffic information around the vehicle location and perform data processing, and then vehicles with high security risk can be warned accordingly, improving driving safety in complicated road conditions, like at intersections. This scenario is also called "Electronic Horizon", as explained in[HORITA]. For instance, video image information captured by, e.g., an in-car, camera is transmitted to the nearest edge node for processing. The notion of sending the request to the "nearest" edge node is important for being able to collate the video information of "nearby" cars, using, for instance, relative location information. Furthermore, data privacy may lead to the requirement to process the data as close to the source as possible to limit data spread across too many network components in the network.

Nevertheless, load at specific "closest" nodes may greatly vary, leading to the possibility for the closest edge node becoming overloaded, leading to a higher response time and therefore a delay in responding to the auxiliary driving request with the possibility of traffic delays or even traffic accidents occurring as a result. Hence, in such cases, delay-insensitive services such as in-vehicle entertainment SHOULD be dispatched to other light loaded nodes instead of local edge nodes, so that the delay-sensitive service is preferentially processed locally to ensure the service availability and user experience.

In video recognition scenarios, when the number of waiting people and vehicles increases, more computing resources are needed to process the video content. For rush hour traffic congestion and weekend personnel flow from the edge of a city to the city center, efficient network and computing capacity scheduling is also required. Those would cause the overload of the nearest edge sites if there is no extra method used, and some of the service request flow might be steered to others edge site except the nearest one.

4.3. Computing-Aware Digital Twin

A number of industry associations, such as the Industrial Digital Twin Association or the Digital Twin Consortium (https://www.digitaltwinconsortium.org/), have been founded to promote the concept of the Digital Twin (DT) for a number of use case areas, such as smart cities, transportation, industrial control, among others. The core concept of the DT is the "administrative shell" [Industry4.0], which serves as a digital representation of the information and technical functionality pertaining to the "assets" (such as an industrial machinery, a transportation vehicle, an object in a smart city or others) that is intended to be managed, controlled, and actuated.

As an example for industrial control, the programmable logic controller (PLC) may be virtualized and the functionality aggregated across a number of physical assets into a single administrative shell for the purpose of managing those assets. PLCs may be virtualized in order to move the PLC capabilities from the physical assets to the edge cloud. Several PLC instances may exist to enable load balancing and fail-over capabilities, while also enabling physical mobility of the asset and the connection to a suitable "nearby" PLC instance. With this, traffic dynamicity may be similar to that observed in the connected car scenario in the previous sub-section. Crucial here is high availability and bounded latency since a failure of the (overall) PLC functionality may lead to a production line stop, while boundary violations of the latency may lead to loosing synchronization with other processes and, ultimately, to production faults, tool failures or similar.

Particular attention in Digital Twin scenarios is given to the problem of data storage. Here, decentralization, not only driven by the scenario (such as outlined in the connected car scenario for cases of localized reasoning over data originating from driving vehicles) but also through proposed platform solutions, such as those in [GAIA-X], plays an important role. With decentralization, endpoint relations between client and (storage) service instances may frequently change as a result.

4.4. Computing-Aware SD-WAN

SD-WAN provides organizations or enterprises with centralized control over multiple sites which are network endpoints including branch offices, headquarters, data centers, clouds, and more. A enterprise may deploy their services and applications in different locations to achieve optimal performance. The traffic sent by a host will take the shortest WAN path to the closest server. However, the closet server MAY not be the best choice with lowest cost of network and computing resources for the host. If the path computation element can consider the computing dimension information in path computation, the best path with lowest cost can be provided.

The computing related information can be the number of vCPUs of the VM running the application/services, CPU utilization rate, usage of memory, etc.

The SD-WAN can be aware of the computing resource of applications deployed in the multiple sites and can perform the routing policy according to the information is defined as the computing-aware SD-WAN.

Many enterprises are performing the cloud migration to migrate the applications from data centers to the clouds, including public, private, and hybrid clouds. The clouds resources can be from the same provider or multiple cloud providers which have some benefits including disaster recovery, load balancing, avoiding vendor lock-in.

In such cloudification deployments SD-WAN provides enterprises with centralized control over Customer-Premises Equipments(CPEs) in branch offices and the cloudified CPEs(vCPEs) in the clouds.The CPEs connect the clients in branch offices and the application servers in clouds. The same application server in different clouds is called an application instance. Different application instances have different computing resource.

SD-WAN is aware of the computing resource of applications deployed in the clouds by vCPEs, and selects the application instance for the client to visit according to computing power and the network state of WAN.

Figure 1 below illustrates Computing-aware SD-WAN for Enterprise Cloudification.

                                                    +---------------+
   +-------+                      +----------+      |    Cloud1     |
   |Client1|            /---------|   WAN1   |------|  vCPE1  APP1  |
   +-------+           /          +----------+      +---------------+
     +-------+        +-------+
     |Client2| ------ |  CPE  |
     +-------+        +-------+                     +---------------+
   +-------+           \          +----------+      |    Cloud2     |
   |Client3|            \---------|   WAN2   |------|  vCPE2  APP1  |
   +-------+                      +----------+      +---------------+

    Figure 1: Illustration of Computing-aware SD-WAN for Enterprise
                         Cloudification

The current computing load status of the application APP1 in cloud1 and cloud2 is as follows: each application uses 6 vCPUs. The load of application in cloud1 is 50%. The load of application in cloud2 is 20%. The computing resource of APP1 are collected by vCPE1 and vCPE2 respectively. Client1 and Client2 are visiting APP1 in cloud1. WAN1 and WAN2 have the same network states. Considering lightly loaded application SD-WAN selects APP1 in cloud2 for the client3 in branch office. The traffic of client3 follows the path: Client3 -> CPE -> WAN1 -> Cloud2 vCPE1 -> Cloud2 APP1

5. Requirements

In the following, we outline the requirements for the CATS system to overcome the observed problems in the realization of the use cases above.

5.1. Support dynamic and effective selection among mutiple serivce instances

The basic requirement of CATS is to support the dynamic access to different service instances residing in multiple computing sites and then being aware of their status , which is also the fundamental model to enable the traffic steering and to further optimize the network and computing services. A unique service identifier is used by all the service instances for a specific service no matter which edge site an instance may attach to. The mapping of this service identifier to a network locator makes sure the data packet CATS potentially reach any of the service instances deployed in various edge sites.

Moreover, according to CATS use cases, some applications require E2E low latency, which warrants a quick mapping of the service identifier to the network locator. This leads to naturally the in-band methods, involving the consideration of metrics to make the selection mechanism either service-specific or category-specific, or both. Therefore, a desirable system

o MUST provide a discovery and resolving methodology for the mapping of a service identifier to a specific address.

o MUST provide an mapping methods for further quickly selecting the service instance.

5.2. Support Agreement on Metric Representation

Computing metrics can have many different semantics, particularly for being service- specific. Even the notion of a "computing load" metric could be represented in many different ways. Such representation may entail information on the semantics of the metric or it may be purely one or more semantic- free numerals. Agreement of the chosen representation among all service and network elements participating in the service-specific instance selection decision is important. Therefore, a desirable system

o MUST agree on the service-specific metrics and their representation among service elements in the participating edges.

o MAY include network metrics

5.3. Support Moderate Metric Distributing

Network path costs in the current routing system usually do not change very frequently. However, computing load and service-specific metrics in general can be highly dynamic, e.g., changing rapidly with the number of sessions, the CPU/GPU utilization and the memory consumption, etc. It has to be determined at what interval or based on what events such information needs to be distributed. Overly frequent distribution with more accurate synchronization may result in unnecessary overhead in terms of signalling.

Moreover, depending on the service-specific decision logic, one or more metrics will need to be conveyed in a CATS domain. Problems to be addressed here may be the loop avoidance of any advertisement of metrics as well as the frequency of such conveyance, thanks to the comprehensive load that a signalling process may add to the overall network traffic. While existing routing protocols may serve as a baseline for signalling metrics, other means to convey the metrics can equally be considered and even be realized. Specifically, a desirable system

o MUST provide mechanisms to distribute the metrics

o MUST realize means for rate control for distributing of metrics

o MUST implement mechanisms for loop avoidance in distributing metrics, when necessary

5.4. Support Flexible Use of Metrics

Considering computing resources assigned to a service instance on a server, which might be related to some critical metrics like the processing delay, is crucial in addition to the network delay in some cases. Therefore, the CATS components might use both the network and computing metrics for service instance selection. For this, a computing semantic model SHOULD be defined for the mapping selection.

We recognize that different network nodes, e.g., routers, switches, etc., may have diversified capabilities even in the same routing domain, let alone in different administrative domains. So, the service-specific metrics that have been adopted by some nodes may not be supported by others, either due to technical reasons, administrative reasons, or something else. There exist scenarios in which a node supporting service-specific metrics might prefer some type of metrics to others[TR22.874]. Of course, specific metrics might not be utilized at all in other scenarios. Hence, there MUST exist flexibility in term of metrics definition and utilization for the selection of service instance. Therefore, a desirable system

o MUST set up metric information that can be understood by CATS components.

o MUST use network and computing metrics in a flexible way that includes a default action for the interoperation of network nodes which may or may not support the specific metrics.

5.5. Support Session and Service Continuity

In the CATS system, a service may be provided by one or more service instances that would be deployed at different locations in the network. Each instance provides equivalent service functionality to their respective clients. The decision logic of the instance selection are subject to the normal packet level communication and packets are forwarded based on the operating status of both network and computing resources. This resource status will likely change over time, leading to individual packets potentially being sent to different network locations, possibly segmenting individual service transactions and breaking service-level semantics. Moreover, when a client moves, the access point might change and successively lead to the same result of the change of service instance. If execution changes from one (e.g., virtualized) service instance to another, state/context needs transfer to another. Such required transfer of state/context makes it desirable to have session persistence (or instance affinity) as the default, removing the need for explicit context transfer, while also supporting an explicit state/context transfer (e.g., when metrics change significantly). So session as well as service continuity MUST be maintained in those situations.

The nature of this continuity is highly dependent on the nature of the specific service, which could be seen as a 'instance affinity' to represent the relationship. The minimal affinity of a single request represents a stateless service, where each service request may be responded to without any state being held at the service instance for fulfilling the request.

Providing any necessary information/state in-band as part of the service request, e.g., in the form of a multi-form body in an HTTP request or through the URL provided as part of the request, is one way to achieve such stateless nature.

Alternatively, the affinity to a particular service instance may span more than one request, as in the AR/VR use case, where previous client input is needed to render subsequent frames.

However, a client, e.g., a mobile UE, may have many applications running. If all, or majority, of the applications request the CATS- based services, then the runtime states that need to be created and accordingly maintained would require high granularity. In the extreme scenario, this granular requirement could reach the level of per-UE per-APP per-(sub)flow with regard to a service instance. Evidently, these fine-granular runtime states can potentially place a heavy burden on network devices if they have to dynamically create and maintain them. On the other hand, it is not appropriate either to place the state-keeping task on clients themselves.

Besides, there might be the case that UE moves to a new (access) network or the service instance is migrated to another cloud, which cause the unreachable or inconvenient of the original service instance. So the UE and service instance mobility also need to be considered.

Therefore, a desirable system

o MUST maintain "instance affinity" which MAY span one or more service requests, i.e., all the packets from the same application- level flow MUST go to the same service instance unless the original service instance is unreachable

o MUST avoid keeping fine runtime-state granularity in network nodes for providing session and service continuity.

o MUST provide mechanisms to minimize client side states in order to achieve the instance affinity.

o Should support the UE and service instance mobility.

5.6. Preserve Communication Confidentiality

Exposing the information of computing resources to the network may lead to the leakage of computing domain and application privacy. In order to prevent it, it need to consider the methods to process the sensitive information related to computing domain. For instance, using general anonymous methods, including hiding the key information representing the identification of devices, or using an index to represent the service level of computing resources, or using customized information exposure strategies according to specific application requirements or network scheduling requirements. At the same time, when anonymity is achieved, it is also necessary to consider whether the computing information exposed in the network can help make full use of traffic steering. Therefore, a CATS system

o MUST preserve the confidentiality of the communication relation between user and service provider by minimizing the exposure of user-relevant information according to user needs.

6. Security Considerations

CATS decision making process is deeply related to computing and network status as well as some service information. Some security issues need to be considered when designing CATS system.

* Service data sometimes needs to be moved among different edge sites to maintain service consistency and availability. Sevice data MUST be protected from interception.

* The act of making compute requests may reveal the nature of user's activities, so this SHOULD be hidden as much as possible.

* The behavior of the network can be adversely affected by modifying or interfering with advertisements of computing resource availability. Such attacks could deprive users' of the services they desires, and might be used to divert traffic to interception points. Therefore, care is needed to secure advertisements and to prevent rogue nodes from participating in the network.

7. IANA Considerations

This document makes no requests for IANA action.

8. Contributors

The following people have substantially contributed to this document:

        Peter Willis
        BT

        Philip Eardley
        ietf.philip.eardley@gmail.com

        Tianji Jiang
        China Mobile
        tianjijiang@chinamobile.com

        Markus Amend
        Deutsche Telekom
        Markus.Amend@telekom.de

        Guangping Huang
        ZTE
        huang.guangping@zte.com.cn

9. Acknowledgements

The author would like to thank Adrian Farrel, Peng Liu, Luigi IANNONE, Christian Jacquenet and Yuexia Fu for their valuable suggestions to this document.

10. References

10.1. Normative References

[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/info/rfc2119>.

10.2. Informative References

[I-D.contreras-alto-service-edge]
Contreras, L. M., Randriamasy, S., Ros-Giralt, J., Perez, D. A. L., and C. E. Rothenberg, "Use of ALTO for Determining Service Edge", Work in Progress, Internet-Draft, draft-contreras-alto-service-edge-07, , <https://datatracker.ietf.org/doc/html/draft-contreras-alto-service-edge-07>.
[TR22.874]
3GPP, "Study on traffic characteristics and performance requirements for AI/ML model transfer in 5GS (Release 18)", .
[HORITA]
Horita, Y., "Extended electronic horizon for automated driving", Proceedings of 14th International Conference on ITS Telecommunications (ITST)", .
[Industry4.0]
Industry4.0, "Details of the Asset Administration Shell, Part 1 & Part 2", .
[GAIA-X]
Gaia-X, ""GAIA-X: A Federated Data Infrastructure for Europe"", .

Appendix A. Potential Future Use Cases

* Computing-aware SFC in single data center(DC):

In data center a lot of applications and different service functions are deployed. Before the north-south traffic or east-west traffic is steered to the application the traffic SHOULD firstly follow the specific SFP(Service Function Path) with some service functions such as firewall, IPS(Intrusion Prevention System). The same type of service function in the DC can have multiple instances deployed on different servers or VMs connected to different switches. There are following scenarios for computing-aware SFC:

Firstly, SFPs are constructed with the ordered chain of SFs considering the computing resources of SF and the cost or latency of network paths between the switches. Normally SF with high available computing resources is selected.

Secondly, multiple SFPs with the same ordered SFs constraints may be set up. When selecting the specific SFP from such SFPs for the traffic with specific classification rule the SFP with lightly loaded SFs is preferred .

In Figure 1, The DC has two firewalls and the north-south traffic to the application is steered to the SFP with FW1 which has light load and IPS.

              +------+   +------+   +------+
              |      |   |      |   |      |
              |      |   |      |   |      |
              |  FW1 |   |  FW2 |   | IPS  |
              +------+   +------+   +------+
                 |          |          |
                 +          +          +
               +----+     +----+     +----+     +----+
    traffic--->| S1 |+--->| S2 |+--->| S3 |+--->| S4 |+-->application
               +----+     +----+     +----+     +----+

       Figure 1: Illustration of Computing-aware SFC in single DC

* Computing-aware SFC in multiple data centers(DCs):

In carrier networks, operators may deploy multiple data centers or computing resource pools dispersed geographically. These data centers can host diverse types of value-added services(VASes) such as FW(Firewall), IPS(Intrusion Prevention System), WOC(Web Optimization Control) and VO(Video Optimizer) shared by the enterprise leased line services, internet services etc.

Each data center may have different types of service functions. For example, high usage service functions are deployed in edge or regional data centers while other low usage service functions are deployed in global or central data centers. So SFCs with different types of service functions may span multiple data centers.

The same service function can be deployed in multiple data centers. In such deployments the SF in one data center is called a SF instance. SFPs are constructed with the ordered chain of SFs each of which is from specific data center.

The path computation of SFP SHOULD consider the computing load of SFs and the cost or latency of network paths between the DCs hosting the SFs in order to get the good service experience of SFs and the optimal end to end network path.

In Figure 2, A enterprise tenant orders SFC with a chain of two value-added services for its access to internet service. The sequenced services of SFC are FW and VO.

            +------+   +------+   +------+
            |DC1   |   |DC2   |   |DC3   |
            |      |   |      |   |      |
            |  FW  |   |  FW  |   |  VO  |
            +------+   +------+   +------+
               |          |          |
               +          +          +
             +----+     +----+     +----+     +----+
     CPE+--->| R1 |+--->| R2 |+--->| R3 |+--->| R4 |+-->internet
             +----+     +----+     +----+     +----+

    Figure 2: Illustration of Computing-aware SFC

The current computing load status of the FW SFs in DC1 and DC2 is as follows: each SF uses 6 vCPUs. The load of DC1 is 50%. The load of DC2 is 20%. Considering lightly loaded SF the computed SFP is represented as: DC2 FW -> DC3 VO. Traffic follows the path: CPE -> R1 -> R2 -> DC2 FW -> R2 -> R3 ->DC3 VO -> R3 -> R4 -> internet

The procedures for SFP creation according to computing power of SFs and network topology may be handled by the control plane as follows:

1.Collect computing power which are computing resources and computing load of of SFs in DCs

2.Associate the DC location and computing power of the available SFs with topological information of network connecting all the data centers to allow control plane to construct the overall map

The following potential solutions could be considered:

3.Compute the actual sequence of specific routers and selected SFs in the network for SFP

If the same SF is deployed in multiple data centers the control plane selects one SF instance for SFP considering the computing load of SF and the cost or latency of network paths between the DCs hosting the SFs.

4.Deliver the actual computed path called Rendered Service Path (RSP) to the routers to steer the traffic from classifier to destination.

In some cases SFP adjustments can be handled. For example, a SF in the selected DC fails, the load of the same SF in each DC varies greatly, and the delay is caused among routers connected to the DC.

Authors' Addresses

Kehan Yao
China Mobile
Dirk Trossen
Huawei Technologies
Mohamed Boucadair
Orange
Luis M. Contreras
Telefonica
Hang Shi
Huawei Technologies
Yizhou Li
Huawei Technologies
Shuai Zhang
China Unicom